Fix error to signal loss in notebook #12

jmiller656 · 2021-02-15T18:13:54Z

This fixes issues with #10 and #11 , and is also related to GuitarML/PedalNetRT#16

This makes the training a sequence to sequence task (predict n output steps for n input steps) and also fixes a problem with the pre-emphasis filter that would cause high / infinite loss values.

In some listening tests, I've found that using mse and mae still seems to sound better than error_to_signal and also seem to optimize error_to_signal as a metric better than just using error_to_signal as a loss function. I suspect adjusting the learning rate (lowering it a lot) could help the error_to_signal loss work much better.

The next things I plan to do are looking into the improved loss functions and pre-emphasis filters in this paper as well as tuning the learning rate for error_to_signal loss

GuitarML · 2021-02-15T19:21:52Z

The pre emphasis filter looks correct, but I'm not getting the same training result as the original method of splitting the data without the data loader class. I tried with the 0.01 learning rate and with the smaller one. One part I don't quite understand, in your WindowArray class, the x_out and y_out are calculated the same way, but each input of x should be of length input_size, and each out y should be a single output. In that way, the previous input_size samples are used to predict the current single sample.

I'll need to read up on the TimeDistributed layer, not familiar with that one. Unfortunately I'm getting worse results than before, but I'm not exactly sure why that is yet.

jmiller656 · 2021-02-15T20:27:52Z

each input of x should be of length input_size, and each out y should be a single output

I think one part of the issue with the previous implementation of the error_to_signal loss is implementing it this way. When I read the paper, I see the equation for the loss looks like this:

As you can see, the loss has a summation over yp[n] and y'p[n]. This summation is over n timestamps. I believe this means that the loss should be calculated at each sample in the window, which is why I need y to also have the same length as the window. As for the time distributed layer, this just applies the same dense layer (in our case, but you can use any layer there) across all the elements of a sequence.

I encourage you to try changing the loss back to mse or mae, these both seem to give much better results than error_to_signal currently

GuitarML · 2021-02-15T21:12:39Z

I think part of the problem is that the LSTM model I made isn't what they developed in the paper. They use a single input, single output stateful LSTM model, where I'm using a non stateful multiple input, single output LSTM model with two conv1d input layers. The paper uses stateful LSTMs to have information of previous samples, where I'm using an input of input_size (default 120) to train the model on what the next sample should be. It could be that the model I made won't work with the same error to signal equation from the paper, because it's made for a different model.

I'll run some more tests, but when I compared the plots and the sound of the mse to the error to signal (with same params and train time), it wasn't as good. I'll also try mae and see how that performs.

GuitarML · 2021-02-15T23:37:22Z

I would be interested to see if using the stateful LSTM -> dense layers from the papers with your new code would work. I tried to get the stateful lstms working but wasn't able to. There is another issue that might have some useful info for doing this: #8 (comment)

jmiller656 · 2021-02-16T01:12:32Z

I think part of the problem is that the LSTM model I made isn't what they developed in the paper.

Yeah, that's a good point, it is different. If you do want a single output model rather than multi output, then it's probably better not to use error_to_signal loss, I think (out of curiosity, is it preferred to keep it that single output?).

By the way, could you post a plot of a model that performs well? I've been trying training the model with an even lower learning rate and for more epochs, and it seems to look promising. I'll post a plot of that soon (I'm assuming you use the output of plot.py)

jmiller656 · 2021-02-16T01:22:08Z

Here's my plot:

mishushakov · 2021-02-16T08:40:58Z

here's my plot

GuitarML · 2021-02-16T13:48:41Z

That's the same thing I get with the old code and the TS9 example. Here is one (also TS9 sample) using the the old code and loss=mse, with the lowered model settings from the SmartAmpPro colab notebook. I left error to signal as a metric, and it ends up being about the same as using loss=error_to_signal. I think having the multiple sample input is more important to prediction than the choice of loss function for this model.

GuitarML · 2021-02-19T11:44:02Z

I ran some more tests using mse and mae for loss, this time by creating models for SmartAmpPro. When using the same settings (3 epochs, 24 hidden units) they sound really close. For the easy to train sounds (like the TS9) I can barely tell a difference between those and the error to signal version. For high gain sounds the difference is more noticeable, but they all sound good. I'll post the models to let you try them out.

@jmiller656 what are your thoughts on mse vs mae? Sorry for leading you down the rabbit hole on error to signal, but what you already added to this colab script works great, and I'll probably propagate it to the local training and SmartAmpPro training.

GuitarML · 2021-02-22T00:33:22Z

Here are a few models using mse for loss, and with higher epochs (30 - 50). See the new colab script I added to the SmartAmpPro repo (train_colab_mse.ipynb).

loss_test_mse.zip

jmiller656 · 2021-02-23T00:41:37Z

Oh sweet! (sorry I disappeared for a bit there, busy week)

As for m se vs mae, I think both are probably fine. In terms of the math, I think that mse is slightly easier for the neural net to learn, but mae will probably be able to handle outliers better (the penalty is linear, rather than quadratic... I wonder if this has to do with the issue you mentioned with high gain sounds 🤔). There's also a "best of both worlds" option I've been meaning to play with more called huber loss (as well as some others which you can read more about in this blog I found ).

Fix error to signal loss in notebook

76d9c39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error to signal loss in notebook #12

Fix error to signal loss in notebook #12

jmiller656 commented Feb 15, 2021

GuitarML commented Feb 15, 2021

jmiller656 commented Feb 15, 2021

GuitarML commented Feb 15, 2021 •

edited

Loading

GuitarML commented Feb 15, 2021 •

edited

Loading

jmiller656 commented Feb 16, 2021

jmiller656 commented Feb 16, 2021 •

edited

Loading

mishushakov commented Feb 16, 2021

GuitarML commented Feb 16, 2021 •

edited

Loading

GuitarML commented Feb 19, 2021 •

edited

Loading

GuitarML commented Feb 22, 2021

jmiller656 commented Feb 23, 2021

Fix error to signal loss in notebook #12

Are you sure you want to change the base?

Fix error to signal loss in notebook #12

Conversation

jmiller656 commented Feb 15, 2021

GuitarML commented Feb 15, 2021

jmiller656 commented Feb 15, 2021

GuitarML commented Feb 15, 2021 • edited Loading

GuitarML commented Feb 15, 2021 • edited Loading

jmiller656 commented Feb 16, 2021

jmiller656 commented Feb 16, 2021 • edited Loading

mishushakov commented Feb 16, 2021

GuitarML commented Feb 16, 2021 • edited Loading

GuitarML commented Feb 19, 2021 • edited Loading

GuitarML commented Feb 22, 2021

jmiller656 commented Feb 23, 2021

GuitarML commented Feb 15, 2021 •

edited

Loading

GuitarML commented Feb 15, 2021 •

edited

Loading

jmiller656 commented Feb 16, 2021 •

edited

Loading

GuitarML commented Feb 16, 2021 •

edited

Loading

GuitarML commented Feb 19, 2021 •

edited

Loading