• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Detecting watermarks through spectral analysis

hvbias

Addicted to Fun and Learning
Joined
Apr 28, 2016
Messages
577
Likes
419
Location
US
Looking for advice on how to detect watermarks through spectral analysis. This is taken from another forum.

I recently purchased a 96/24 download of Keith Jarrett's Sleeper from Ponomusic. This is on the ECM label, and ECM is distributed by Universal.

I also have the CD, so I thought I would see I could find some objective evidence of watermarking.

Here's what I did:

- Convert the 96/24 download to 44.1/16 using foobar
- Imported the downscaled download and the cd rip of the same track into Audacity
- Since the tracks had slightly different starting points, manually aligned the two tracks
- Inverted one of the tracks
- Mixed and Rendered the two tracks

Attached is the spectrum analysis for the results. It is not as well defined as the example at http://www.mattmontag.com/images/watermarked_full.png , but that may be because of the down-conversion and manual alignment.

watermark.jpg


watermark_detail.jpg

This method does work very well if the hi-res download and the CD share the exact same mastering. However it becomes harder if the two versions aren't the same mastering and for the vast majority of downloads the mastering will differ from any CD versions.

A better image from Matt Montag's blog:
NuJrfPr.png

Spectrogram of the difference between a watermarked and unwatermarked UMG track. The energy is concentrated in two bands between about 1 khz and 3.5 khz - where the human ear is most sensitive.

UMG uses a spread spectrum watermark, a technique explained in detail in this Microsoft research paper. The watermark scheme modulates the total energy in two different bands, 1khz to 2.3 khz and 2.3 to 3.6 khz. The energy is concentrated in the most perceptually sensitive frequencies because that makes it more difficult to attack or remove without significant audible distortion.


The energy is increased or reduced in 0.04 second blocks. The result can be characterized as a fluttering, tremolo sound. Listen closely to the original vs. watermarked audio samples and try to focus on the 1 khz to 3.6 khz noise range. It helps to wear headphones in a quiet environment.

However he does not mention if he used the exact same mastering for that screenshot. Or the Three Doors Down sample where he inverted one file then mixed the two together and subtracted the difference.

I have tried doing this with a file I am certain has an audible watermark and its CD but it just looks like a mess in the subtracted spectrograph.

Any other suggestions?
 
Last edited:

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,595
Likes
239,614
Location
Seattle Area
The author of that paper, Rico Malvar is a good friend and close colleague while I was at Microsoft. That scheme was proposed but NOT used by the labels due to requirement for higher CPU cycles to detect. At least that was the case when I was there and the article is dated 2003 which matches my time frame. Has UMG adopted that post the time I was involved?

The Microsoft algorithm relied on data hiding in a similar manner to lossy audio compression. The watermark bits were added using psychoacoustics analysis to make sure it is masked by the music. That is, the level of quantization noise is increased but kept under detection threshold for that critical band. As such, simple spectrum analysis like that will not be revealing as the where the mark is inserted is time/content specific.
 
OP
H

hvbias

Addicted to Fun and Learning
Joined
Apr 28, 2016
Messages
577
Likes
419
Location
US
The author of that paper, Rico Malvar is a good friend and close colleague while I was at Microsoft. That scheme was proposed but NOT used by the labels due to requirement for higher CPU cycles to detect. At least that was the case when I was there and the article is dated 2003 which matches my time frame. Has UMG adopted that post the time I was involved?

The Microsoft algorithm relied on data hiding in a similar manner to lossy audio compression. The watermark bits were added using psychoacoustics analysis to make sure it is masked by the music. That is, the level of quantization noise is increased but kept under detection threshold for that critical band. As such, simple spectrum analysis like that will not be revealing as the where the mark is inserted is time/content specific.

I haven't had a chance to read the Microsoft paper yet, I believe the one currently implemented was developed by a Korean company. I would still be annoyed by watermarks hidden in the ultrasonics since we're paying customers, but to have it embedded in the upper midrange/low treble is inexcusable.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,595
Likes
239,614
Location
Seattle Area
I haven't had a chance to read the Microsoft paper yet, I believe the one currently implemented was developed by a Korean company. I would still be annoyed by watermarks hidden in the ultrasonics since we're paying customers, but to have it embedded in the upper midrange/low treble is inexcusable.
Let me make you feel worse :). The requirement from labels is that the mark must survive a multitude of attacks:
1. Level and EQ changes
2. Lossy compression of any type
3. Edits of the content
4. Be detectable and insertable in as few as 30 seconds of the content.

These requirements mean that the number of bits inserted is way beyond what the actual payload (data) is. A watermark may be just 32 or 64 bits but thousands and thousands of bits are inserted throughout the song acting as error correction and mitigation against the above.

The microsoft scheme was actually quite superior to what was eventually selected since it would use a model of hearing system to insert its bits in an inaudible way. The other schemes proposed, one of which that won, did not fare so well but because of requirement for low CPU power they won.

All of this said, I participated in the listening tests of our algorithm. I found an audible issue in blind tests that was fixed. Once there I could no longer distinguish it in 24 bit-/96 Khz high-res content. I don't know about audibility of the algorithm picked.
 
Top Bottom