-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing the model with new data #8
Comments
Hi, Thanks for your interests! If I understand you correctly, what you want is to use the model to inference on some unlabeled audio clips. This is the same process as we submit the evaluation results for the DCASE challenge. And this function is integrated within the DCASE baseline codes as well as in the codes of ATST-SED, supporting by the pytorch-lightning. To do this:
The system would inference the data automatically and the predicted results would be stored in your Hope these help : ) |
Thank you so much @SaoYear. This really helped, Much appreciate!! |
Hi @SaoYear, I might ask silly doubts, but I am getting this error, when tried testing in the same way you described. Although I have ensured all the input file to be 10s each in duration and file sizes also same, I don't know why I am getting this error, and how to fix it. Please help me @SaoYear (dcase2023) empuser@server:~/ATST-SED-Scripts/ATST-SED/train$ python train_stage2.py --gpus 1 --eval_from_checkpoint exp/stage2/version_0/epoch=209-step=23100.ckpt
distributed_backend=gloo Testing DataLoader 0: 0%| | 0/32 [00:00<?, ?it/s]torch.Size([1, 1, 2505, 128]) |
Hi, it seems like you made some modifications on the I will attempt to explain what's happening in the function
I would recommend you to print the shape of |
I have also encountered this problem. May I ask how many seconds of audio should be in "eval_folder"? My test set is 10s, it seems that the shape cannot match audio, atst_feats, labels, padded_indxs, filenames = batch |
The shape of your waveforms is incorrect. You should resample them to 16kHz. To do so, you could refer to |
Hello! Thank you very much for your help! But I still have some questions:
|
|
Hello! Your reply was very helpful to me. I carefully reviewed the code for "ATST Frame". If I want to obtain frame level sound event detection for my test dataset, should I change and run this part of the code? ######################## DESED The inference code is "audiossl/audiossl/methods/atsframe/downstream/train_as_strong. py" |
Yeah, you could refer to what we've done in the ATST-RCT system, last paragraph of section 3. Quick summary:
|
You could refer to ATST-RCT repo, I just uploaded a neccessary file. Please see the |
Yeah, there are three steps:
|
Hi, have anyone written separate code for just inference, like loading the model, trained weights and running it on 10 s audios to get inference per file [may be some post processing too]. If anyone has done it, please help me with that, how to do it. Thank you. |
@martineghiazaryan @magicalvoice |
will finish it by this weak ; )在 2024年7月11日,20:28,Martin Yeghiazaryan ***@***.***> 写道:
@SaoYear hey any news from the script?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hey guys, sorry for the late but I have update an inference file in the latest commit. You could use the inference file by: If you have any other problem, please let me know. |
@SaoYear First of all, Thank you so much for your kind help!! I have another question, How to interpret this result inference_result.png, like True False for each chunk based on a threshold, but which class it belongs ? Also, I want to clarify a doubt, I read your paper "Fine-tune the pretrained ATST model for sound event detection", it is basically training and finetuning ATST-Frame model along with the help of CRNN, because DESED is quite small data for finetuning ATST-Frame? but then Who is Teacher and who is Student here in this case, cause I am getting confused by looking at results for both stages has student and teacher. I am stuck with lot of questions, Please help, Thanks in advance!! |
|
@SaoYear is it supposed to do this? the audio has parts of people talking. |
Hi @SaoYear Thank you, I understood. after stage 2 training i.e fine-tuning both CRNN and ATST-Frame, can I only use CRNN weights differently, is there a way? if yes, how. |
@magicalvoice I never tried to do that. But I suppose that, only using the CRNN part of the ATST-SED shall not be better than a CRNN trained from scratch. If you want to do that, you could just comment the ATST features and the merge layer MLP. And feed the CNN output to the RNN directly. The CNN trained in ATST-SED is regarded as a compensation for some local features that ignored by FrameATST. And the RNN in the ATST-SED is traiend to learn the fused features from both FrameATST and CNN. If you want to use just the CRNN part of the entire model, the performance of both CNN and RNN would be weaken and therefore the overall performance would be weaken. |
Hi @HeChengHui , would you mind to post the wav file? There could be some problems with the inference process. |
Hi @HeChengHui , thanks for sharing the wav. The splitting and overlapping are the intention of the inference. I have fixed some problems in the original inference code:
Now the inference looks fine. According to the audio clip you provided, the SED results look like: |
@SaoYear are you logging validation loss too ? where can I get that |
@magicalvoice Sorry for the late response. The logging of validation loss is implemented in the ultra_sed_trainer.py line 477-491. BTW, you could view it on the tensorboard, using the command: |
Hi,
@SaoYear Thank you for the great work. I am new to the problem of SED, I have fine-tuned iwth own data, Now, I just want to test the final fine-tuned model with test audio files, not having ground truth for the same. Is there any script to do that, without the need of preparing the .tsv files with onset offset event label etc. in the format of DESED data.
Basically, how will I use model for completely unknown input audio.
if you could tell me in steps, It would be really helpful.
Thank you so much in advance!
The text was updated successfully, but these errors were encountered: