Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audible noise in watermarked audio even with --strength 5, also not extractable even at --strength 100 #48

Closed
bomsn opened this issue Feb 4, 2024 · 11 comments

Comments

@bomsn
Copy link

bomsn commented Feb 4, 2024

Hello, first of all, thank you very much for this amazing piece of software. I've been experimenting with it for the past month and it's definitely the best open source project for watermarking audio files.

I've encountered a consistent issue with watermarking certain audio files. I can describe the file as "Dark Pulse Flutter" and is producing noticeable noise after watermarking, even when using a minimal strength setting (e.g., --strength 5), not to mention that the watermark is not extractable at all even when increasing the strength (e.g., --strength 100)

Here's what I've observed and attempted so far:

  • The noise is particularly evident with some audio types, not all ( Let me know if there is a way to share the audio file securely ).
  • The problem persists across various watermark strengths.
  • Applying a low-pass filter seems to reduce the noise's audibility..

Given the above, I'm seeking advice on adjustments that could help mitigate the noise issue without compromising the watermark's extractability. Here are some specific points and questions:

  • Watermark Algorithm Sensitivity: Is there a way to adjust the sensitivity of the patchwork algorithm to these audio types, perhaps by modifying the distribution of frequency bands used for embedding?

  • Parameter Tuning: Could you suggest parameter tuning that might address the noise issue? I am particularly interested in whether there are non-documented parameters or advanced configurations that could be adjusted.

  • Low-Pass Filter Usage: Since a low-pass filter has helped, is there a recommended approach or a set of parameters that would allow for its use without affecting the watermark?

Frequency Resolution: Would changing the frame size, if possible, provide a solution? If so, how could this be achieved given the current limitations of the command-line options?

Any insights or suggestions would be greatly appreciated. The goal is to find a balance between minimizing the watermark's audibility and ensuring its reliability and extractability, especially for this specific audio type.

@swesterfeld
Copy link
Owner

That is interesting. I've run automated tests with a big set of music files, and I've never come across a file that would be impossible to watermark (or rather, where the watermark could not be detected when using a sufficiently high strength). Are the files long enough (one minute would be best)? Does the sychronization work (are the positions of the patterns detected at the right locations)? Maybe you can share one of these files by mail, or send me a link where I can get access?

Also that the watermark is audible is strange (but of course not impossible). Indeed one approach would be parameter tuning. There is a bit of discussion about the frame size parameter in this PR #34.

Most parameters should be in Params, in wmcommon.hh / wmcommon.cc.

Using a bigger frame_size makes the watermark be more precise in matching the frequencies of the input signal (so one could hope that it would be masked a bit better by the input signal if there are strong sine components in the spectrum), but might also make it more blurry in time (which could cause something like pre-ringing or transient blurring). The default here is 1024, you could try 2048. I know other watermarkers use this FFT size.

Note that the parameters depend on each other, so if you change one, you might also need to change others. The frequency range is controlled by min_band / max_band (which is relative to the frame size). Using a different frequency range could sound better. Since you mention a low pass filter, since we're already performing an FFT, selecting a set of bands should be somewhat equivalent to a low pass filter.

Maybe if noisiness is a problem, making the watermark more sparse by using less bands over the frequency range could have an effect (so that would be using less bands_per_frame relative to the range).

Unfortunately tuning parameters to get a good result is not going to be easy, and things may break in unexpected ways, as it is not really supported at this point to do it in a user friendly way. Its a bit of trial-and-error to get the right balance to make all parameters play well together as well. There is also stuff in wmspeed.cc that somehow depends on the parameters without being in the main parameter section.

Btw, we have some developer documentation (that is already written) that will be included in the next release, which is really soon now. This could help you a bit to better understand what the parameters actually do.

@bomsn
Copy link
Author

bomsn commented Feb 5, 2024

Hey @swesterfeld, thanks for getting back!

The file is 18 seconds, but it's probably not a length issue as I have it working with much smaller files (eg; 3 seconds). I can send the file by email but can't find your email address anywhere, do you mind sharing?

This is an example output of running the watermark add command, if that's going to help:

Input:        /tmp/add-watermark/Dark_Pulse_Flutter.wav
Output:       /tmp/add-watermark/WM_Dark_Pulse_Flutter.wav
Message:      02f76458dd544b65f19a1c5622eb0549
Strength:     18

Time:         0:18
Sample Rate:  48000
Channels:     2
Data Blocks:  0

When I try to use get on the output file, it doesn't return anything, as if there is no watermark unlike other files.

Thanks for the details regarding the parameters, will look into that, and looking forward to the developer documentation.

@swesterfeld
Copy link
Owner

I can send the file by email but can't find your email address anywhere, do you mind sharing?

[email protected]

@bomsn
Copy link
Author

bomsn commented Feb 5, 2024

Hey Stefan, I just sent you an email, let me know if you need anything else!

@swesterfeld
Copy link
Owner

Well, I can at least tell you why the watermark detection doesn't work as expected: after listening to the file and looking at a spectrum view, I can see that for this file most of the signal energy is in the very low frequencies. However, audiowmark with the default settings doesn't watermark these frequencies. This is why detection fails. The lowest bin with the default settings is

min_band * frame_size / 44100 = 20 * 1024 / 44100 = 861.328125

and all other bins are above of this one, whereas for your file most of the energy is below this frequency.

@bomsn
Copy link
Author

bomsn commented Feb 6, 2024

Thanks Stefan, I thought it has something to frequencies in the audio file being lower than those of the watermark.

If we lower the settings below those of the audio file, would it still work? Or is there a lower limit where it stops being reliable?

Also, for similar audio files where the extraction works but with noticeable noise, if we spread the watermark a little bit, would that help with the noise?

Finally, I know it's a long shot, but if audiowmark can detect file frequencies and raise an error/show message where it's not going to work, that's going to be amazing.

@swesterfeld
Copy link
Owner

Ok, I found an improvement that fixes the problems with your file. Changing the window function like this:

diff --git a/src/wmcommon.cc b/src/wmcommon.cc
index 778acdc..db2a011 100644
--- a/src/wmcommon.cc
+++ b/src/wmcommon.cc
@@ -72,8 +72,8 @@ FFTAnalyzer::gen_normalized_window (size_t n_values)
   for (size_t i = 0; i < n_values; i++)
     {
       const double n_values_2 = n_values / 2.0;
-      // const double win =  window_cos ((i - n_values_2) / n_values_2);
-      const double win = window_hamming ((i - n_values_2) / n_values_2);
+      const double win =  window_cos ((i - n_values_2) / n_values_2);
+      //const double win = window_hamming ((i - n_values_2) / n_values_2);
       //const double win = 1;
       window[i] = win;
       window_weight += win;

sounds better. I think it might even be reasonable to use this by default in new releases. I'd have to run a few tests to see if that affects robustness in a negative way.

What is even better: with the change of the window function, watermark detection also works for your file. I guess the reason is that the window change reduces the amount different frequency bands that are somewhat apart affect each other. In your case, the very low frequencies do not affect the higher frequencies that much with the changed window function.

As for the frequency range - not sure if you still want to experiment with that - I think the absolute minimum bin that you can use from a signal processing point of view is bin 4. However, you probably need to do listening tests to see how it sounds. The actual frequency will depend on the bin number and the frame size.

Finally, I know it's a long shot, but if audiowmark can detect file frequencies and raise an error/show message where it's not going to work, that's going to be amazing.

Not sure if implementing this is really worth it. However, if you want to test whether watermarking a file was successful, you can simply use audiowmark get after adding the watermark and look at the sync score. For instance:

$ audiowmark add /tmp/Dark_Pulse_Flutter_C_120bpm.wav x.wav f0 -q --strength 15
$ audiowmark get x.wav --strength 15
pattern  0:00 f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0 1.389 0.112 CLIP-B

Now 1.389 is significantly higher than 1.0, so we can assume it worked.

@bomsn
Copy link
Author

bomsn commented Feb 6, 2024

That's amazing, thank you very much. I don't think I need to expirement with the settings if it's working. Would not want to mess with source code and lose the ability to update if there is a new release, unless there as a way to expirement without having to change the source code?

Should we expect an update soon, or do you think I should just go ahead an make the change directly?

One more question, I noticed that "Data Blocks" was 0 when adding a watermark for the file, should we expect that 0 means it hasn't been watermarked correctly? I'm building an API that's supposed to handle a lot of files and would like to avoid additional processing if not necessary.

@swesterfeld
Copy link
Owner

There should be a new release soon. If you want to play with the change, you can do so by changing the source now. I hope that for the next release this will no longer be necessary (I'm currently running a few tests, to see if the change has a negative robustness impact for other files).

One more question, I noticed that "Data Blocks" was 0 when adding a watermark for the file, should we expect that 0 means it hasn't been watermarked correctly?

No, the number of data blocks only depends on the length of the file. Each data block is ~50 seconds long, so for a 1 minute file, you'll always see one data block being written. Note that files that watermarks on files shorter than one data block can still be detected, what is possible there really depends on the strength and the amount of changes being done to the file after watermarking (i.e. mp3 compression).

@bomsn
Copy link
Author

bomsn commented Feb 8, 2024

Awesome news, thank you! By the way, I tested this and it works. I can't hear any noise and the hash is extractable, which is amazing. As you mentioned, this probably going to be best as a default. Looking forward to the results of your tests. However, I think the cos window might be less robust on high-frequency files. I'm not sure if it would be possible to analyze the audio frequencies, but if so, it might be best to switch between the two, or have some option, or a fallback mechanism to handle cases where it would be problematic.

Makes sense regarding the Data Block, thanks.

One more question if you don't mind, what settings that needs to stay the same on both add and get. In other words, what values does the extraction rely on to be the same at the time of watermarking to work correctly (eg; watermarking key)?

I think that's my last question, thanks again!

@swesterfeld
Copy link
Owner

I have tested this now with lots of files in lots of situations and on average changing the window function does have a positive effect on robustness, although it is not a lot. So I'll make that the default. It is also somewhat backward compatible, so if you have a hamming-window watermarked file, it usually can be extracted with the new window function, although extraction would work a litttle better if you watermark and extract with the same window.

One more question if you don't mind, what settings that needs to stay the same on both add and get

Of course the key. If you watermark with different strengths (>=10), you can extract with the default strength, so you need not match the strength here. And of course if you use short payload for watermarking, you need to do so for extraction as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants