• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Audio forensic problem - detection of time stretching - analog to digital to analog

So the theory is that some audio was deleted and the remaining audio was stretched to keep the total duration the same?

Makes sense.

I think @Guermantes' approach should be your first stop. Look for any discontinuities in the spectrogram at the time you believe a cut took place. Time stretching would not change this by itself. Matching spectrum at the beginning / end of edits is probably not that hard with modern tools (haven't tried in years) but you would need to know something about audio to realize it was a necessary covering-tracks step.

I would also say 5-10% stretching might be detectable depending on how it was stretched. Speech will tend to sound fairly natural but any extraneous sounds, especially sharp sounds, might end up with unnatural-looking (and sounding?) transients in the stretched version.
Yes I can see discontinuities in the spectrogram which proves tampering but that only answers the question whether the tapes are fakes. But proof of time-scaling would answer the question of who stretched them because they are of fixed length. Only the possessor of the original material could have done that. This is the so-called 'attribution' question.

I did try Ableton that detects tempo changes but it doesn't apparently work for speech.
 
What's on the tape? Did it cost more than £1? If not, why not just record over it from a FLAC file?
 
Yes I can see discontinuities in the spectrogram which proves tampering but that only answers the question whether the tapes are fakes. But proof of time-scaling would answer the question of who stretched them because they are of fixed length. Only the possessor of the original material could have done that. This is the so-called 'attribution' question.

I did try Ableton that detects tempo changes but it doesn't apparently work for speech.
OK, makes sense. Proving time-stretching would be easier if you know what software might have been used for time-stretching.

In general you will have to look closely for artifacts in the spectrogram to find evidence of time-stretching.

Since the hypothesis is stretching was around 5-10%, they will be subtle.

Voice is easy to stretch with minimal artifacts, so you're going to want to look at any extraneous sounds in the recording, like tapping, maybe a glass clinking, footfalls, etc. Transients (sharp percussive sounds) tend to fare worst from time-stretching, but tools have evolved to make that less of an issue.

You might also find clues if parts of the spectrum are missing - time stretching algos might throw away extraneous stuff like subsonic noise that would otherwise be present in a tape recording.
 
Legal proceeding evidence. So no monetary value.
Ohh, I get it now. It has to be shown to be impeccable and untouched...

I'm relieved it's not just a bootleg tape of Karma Chameleon or such.
 
OK, makes sense. Proving time-stretching would be easier if you know what software might have been used for time-stretching.

In general you will have to look closely for artifacts in the spectrogram to find evidence of time-stretching.

Since the hypothesis is stretching was around 5-10%, they will be subtle.

Voice is easy to stretch with minimal artifacts, so you're going to want to look at any extraneous sounds in the recording, like tapping, maybe a glass clinking, footfalls, etc. Transients (sharp percussive sounds) tend to fare worst from time-stretching, but tools have evolved to make that less of an issue.

You might also find clues if parts of the spectrum are missing - time stretching algos might throw away extraneous stuff like subsonic noise that would otherwise be present in a tape recording.

From what I can gather due to the age of the tapes and some other info, it is likely Protools was used.

As they were originally four track, but digitised into two, they have been remade into four by panning the signal into the missing two channels to re-create the illusion of four. As such, the spectrograms are unreliable because of volume/gain differences. For example if someone is speaking but mixed/overdubbed by silence from another mic or another speaker.

I am looking for something deeper, and really non-audio but electronic.
 
OK, so this is just a wild guess but: Assuming that the 50 Hz signal is from the orignal recording and has been time stretched and then corrected down again to 50Hz, you would still very likely have phase discontinuities at the cutting points. That means you could theoretically substract a 50 Hz signal from your tape and you should be able to almost fully eliminate it from the spectrogram. It might slowly wander out of and in phase again, because the recording is quite long. This would cause a slow re-appearance of the 50 Hz signal in the spectrogram. But any actual cut would light up in there, like a christmas tree: At that point, your perfectly aligned, deleted 50 Hz tone would appear again due to the jump of its phase at the cutting point.

If present, this would be an obvious tell of cutting. If not present, it is not a proof of no cuts, though: The person potentially altering the tape might have thought of this and corrected the cutting points to keep the 50 Hz signal in phase.

EDIT: Just some picture from a Keysight manual to demonstrate what a phase discontinuity will look like.
phase_discontinuity.png
 
The suggestion that @staticV3 made is the most feasible. Mains hum is very constant. The stretching you think occurred is ~10 mins out of 90, orders of magintude larger than normal mains variation. And the mains noise is easily detectable and quantifiable using even free programs.

To illustrate mains noise in a measurement, here is mains noise of moderately spec'ed amplifier. The mains hum is only ~0.002% of the total signal, but is still trivial to measure.
1778620454735.png

You can see the mains fundamental at 60Hz with harmonics at 120Hz, 180Hz, and higher.
If this amp was measured using 50Hz mains, the same harmonics would be centered on 50Hz.
Your suspicion is that in your recordings the 60Hz peak would be rendered at ~67Hz if the recording were in the US, or 56Hz in a country with 50Hz mains.

10% should be trivial to analyze in REW (Room Equalization Wizard).

Convert one of the files to a .wav, use REW's RTA to find the mains frequency:
1778621051105.png
 
Honestly, this is proper interesting. No sarcasm.

I feel like it's Gene Hackman in The Conversation
 
From what I can gather due to the age of the tapes and some other info, it is likely Protools was used.

As they were originally four track, but digitised into two, they have been remade into four by panning the signal into the missing two channels to re-create the illusion of four. As such, the spectrograms are unreliable because of volume/gain differences. For example if someone is speaking but mixed/overdubbed by silence from another mic or another speaker.

I am looking for something deeper, and really non-audio but electronic.
Are there any extraneous sounds in the recording, especially any kind of tapping / knocking sounds? These types of sounds tend to be distorted most by time-stretching, so if it was used to the extent it caused visible artifacts, they would tend to show up regardless of how the tape was mixed or edited.
The suggestion that @staticV3 made is the most feasible. Mains hum is very constant. The stretching you think occurred is ~10 mins out of 90, orders of magintude larger than normal mains variation. And the mains noise is easily detectable and quantifiable using even free programs.

To illustrate mains noise in a measurement, here is mains noise of moderately spec'ed amplifier. The mains hum is only ~0.002% of the total signal, but is still trivial to measure.
View attachment 531853
You can see the mains fundamental at 60Hz with harmonics at 120Hz, 180Hz, and higher.
If this amp was measured using 50Hz mains, the same harmonics would be centered on 50Hz.
Your suspicion is that in your recordings the 60Hz peak would be rendered at ~67Hz if the recording were in the US, or 56Hz in a country with 50Hz mains.

10% should be trivial to analyze in REW (Room Equalization Wizard).

Convert one of the files to a .wav, use REW's RTA to find the mains frequency:
View attachment 531854
If proper time stretching was used, this method won't work, it preserves frequency by resynthesizing the audio using FFTs. So there won't probably be any change in pitch, I think if there's any evidence left it will be in the types of distortion/artifacts you get from frequency domain editing.
 
OK, so this is just a wild guess but: Assuming that the 50 Hz signal is from the orignal recording and has been time stretched and then corrected down again to 50Hz, you would still very likely have phase discontinuities at the cutting points. That means you could theoretically substract a 50 Hz signal from your tape and you should be able to almost fully eliminate it from the spectrogram. It might slowly wander out of and in phase again, because the recording is quite long. This would cause a slow re-appearance of the 50 Hz signal in the spectrogram. But any actual cut would light up in there, like a christmas tree: At that point, your perfectly aligned, deleted 50 Hz tone would appear again due to the jump of its phase at the cutting point.

If present, this would be an obvious tell of cutting. If not present, it is not a proof of no cuts, though: The person potentially altering the tape might have thought of this and corrected the cutting points to keep the 50 Hz signal in phase.

EDIT: Just some picture from a Keysight manual to demonstrate what a phase discontinuity will look like.
View attachment 531855
Thank you Random Ear. This was, if I remember correctly, one of the first things an audio forensic examiner told me. Opening up the sound file and rejoining the broken 50Hz mains hum (UK) to hide edits. However what I have found is more sophisticated. Firstly, several machines are involved in the process, each adding their own mains fundamental frequency. So which is the original? Overdubbing hides the breaks. What actually remains are the amplified internal machine frequencies (by raising the EQs through a mixer).

Time stretchering breaks those amplified frequencies. See attached spectrogram.
 

Attachments

  • Capture_21.jpg
    Capture_21.jpg
    929.6 KB · Views: 59
Thank you Random Ear. This was, if I remember correctly, one of the first things an audio forensic examiner told me. Opening up the sound file and rejoining the broken 50Hz mains hum (UK) to hide edits. However what I have found is more sophisticated. Firstly, several machines are involved in the process, each adding their own mains fundamental frequency. So which is the original? Overdubbing hides the breaks. What actually remains are the amplified internal machine frequencies (by raising the EQs through a mixer).

Time stretchering breaks those amplified frequencies. See attached spectrogram.
If you want to prove that different machines are used you could verify the azimuth of the recording head. precise alignment is needed to not get high frequency loss.
 
If you want to prove that different machines are used you could verify the azimuth of the recording head. precise alignment is needed to not get high frequency loss.
No, I know that multiple machines have been used.

The point of my question is that because the the tapes are full, only proof of time stretching can reveal who is responsible, as only the person in possession of the original tapes would have that facility.
 
Last edited:
No, I know that multiple machines have been used.

The point of my question is that because the the tapes are full, only proof of time stretching can reveal who is responsible, as only the person in possession of the original tapes could be responsible.
Do I understand your problem:

1. Bob recorded a forensic interview with Chloe on a speech-only fixed length 60 minutes cassette-type tape (slightly different from a classic music cassette recorder setup). Chloe said something important

2. Alice has somehow cut out the 10 minutes of what Chloe said, but this would leave a gap in a fixed length cassette. So Alice has time stretched by approximately 10% the remaining interview to fit the fixed length.

3. You know Alice has done this - you are NOT trying to prove the edit or time stretch, nor the exact edit points. Your challenge is finding WHO Alice is

4. What "fingerprint" has Alice's edit and time stretch left behind? If there is a fingerprint (and you are assuming it was done digitally by ADCing into a DAW then DACing it back to the cassette) can you work out the technique and the device that was used. This might be enough evidence to pin down Alice's identity.
 
Do I understand your problem:

1. Bob recorded a forensic interview with Chloe on a speech-only fixed length 60 minutes cassette-type tape (slightly different from a classic music cassette recorder setup). Chloe said something important

2. Alice has somehow cut out the 10 minutes of what Chloe said, but this would leave a gap in a fixed length cassette. So Alice has time stretched by approximately 10% the remaining interview to fit the fixed length.

3. You know Alice has done this - you are NOT trying to prove the edit or time stretch, nor the exact edit points. Your challenge is finding WHO Alice is

4. What "fingerprint" has Alice's edit and time stretch left behind? If there is a fingerprint (and you are assuming it was done digitally by ADCing into a DAW then DACing it back to the cassette) can you work out the technique and the device that was used. This might be enough evidence to pin down Alice's identity.
Top Tip: Change your friends. They are probably making gonzo movies anyway. :)
 
Do I understand your problem:

1. Bob recorded a forensic interview with Chloe on a speech-only fixed length 60 minutes cassette-type tape (slightly different from a classic music cassette recorder setup). Chloe said something important

2. Alice has somehow cut out the 10 minutes of what Chloe said, but this would leave a gap in a fixed length cassette. So Alice has time stretched by approximately 10% the remaining interview to fit the fixed length.

3. You know Alice has done this - you are NOT trying to prove the edit or time stretch, nor the exact edit points. Your challenge is finding WHO Alice is

4. What "fingerprint" has Alice's edit and time stretch left behind? If there is a fingerprint (and you are assuming it was done digitally by ADCing into a DAW then DACing it back to the cassette) can you work out the technique and the device that was used. This might be enough evidence to pin down Alice's identity.
Nearly! Except in your analogy, I do know who Alice is (who has been assisted by a competent audio engineer).

What I am trying do is to pin down Alice's culpability which she can deny, saying any third party could have done this - where's the proof that its me? I believe that proof is evidence of time stretching.

I know and can guess the machinery used. And probably Pro Tools software.

Time stretching is the 'fingerprint I am looking for. I think I know just about everything else in this case.
 
Last edited:
Back
Top Bottom