You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote about it there, if you'd like to read about the troubleshooting.
Here I'll bring the workaround, which is mainly learning your code and fitting it to my needs.
It have been a while since the one who opened the issue (while using your code).
I chose to "make it work" rather than wait for a fix (seems that repository is not active since 2023).
It all started when I try to input a Flac audio file of 3 second exactly, and sample rate of 48000:
frompathlibimportPath# as shown in birdnet documentationsaudio_path=Path("my_3_second_segment.flac")
predictions=SpeciesPredictions(predict_species_within_audio_file)
audio_path,
custom_model=model,
species_filter=custom_species
))
"""Error:soundfile.LibsndfileError: Internal psf_fseek() failed."""
The rest about the investigation and troubleshooting is there.
I solved it by practically going around the problem and avoiding soundfile.
I downloaded the protobuf version of BirdNet model, loaded it and its labels.
Here is some code to get the picture (without some error handling, to simplify its readability):
# loaded the save_mode.pb directory:importtensorflowastfloaded_model=tf.saved_model.load(Path("path_to_folder_containing_saved_model_file"))
model=loaded_model.signatures['basic']
# loaded the labelswithopen(path_to_labels_file, 'r') asf:
labels=f.read().splitlines()
to get the signal from librosa.load with the loaded model I piped ffmpeg to io.BytesIO with the following:
importsubprocessimportio# make sure you got ffmpeg installed in your systemwav=subprocess.run(
f"ffmpeg -v quiet -i {path.as_posix()} -ar {sample_rate} -y -f wav -",
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
# load using librosa to get the signalimportlibrosawav_buff=io.BytesIO(wav.stdout)
signal, sample_rate=librosa.load(wav_buff, sr=48000)
# convert to tensor and reshape as the model expects to get the signaltensor=tf.reshape(tf.convert_to_tensor(signal, dtype=tf.float32), (1, len(signal)))
from this point I mimic the birdnet package code, modified to serve my use case:
# inferenceresults_x=model(tensor)
# to get the scores values as confidencedeflogistic(x, sensitivity=1.):
return1/(1+np.exp(sensitivity*x))
results=logistic(results_x["scores"][0])
# attach results to labelsresults=list(zip(labels, results))
# filter species by location and timefilter_species=birdnet.predict_species_at_location_and_time(latitude, longitude, week)
filter_set=set (filter_species.keys())
filtered_results= [rforrinresultsifr[0] infilter_set]
# to get best 5 ascendingbest=5top_5=sorted(filtered_results, key=lambdar: r[1])[-best:]
# OR descendingtop_5=sorted(filtered_results, key=lambdar: r[1], reverse=True)[:best]
I think the main take here is piping ffmpeg output to io.BytesIO buffer to avoid disk IO and can be read by librosa.
Practically it allows to use all the audio files ffmpeg knows to handle, convert and resample to wav and continue as usual from this point...
Maybe this example is "good enough" for workaround, and you need something solid with predictable error handling.
I think there is a python binding for ffmpeg, but I never used it, so I can't recommend.
(feel free to correct me if I missed something, signal processing is kinda new to me)
The text was updated successfully, but these errors were encountered:
I wrote about it there, if you'd like to read about the troubleshooting.
Here I'll bring the workaround, which is mainly learning your code and fitting it to my needs.
It have been a while since the one who opened the issue (while using your code).
I chose to "make it work" rather than wait for a fix (seems that repository is not active since 2023).
It all started when I try to input a Flac audio file of 3 second exactly, and sample rate of 48000:
The rest about the investigation and troubleshooting is there.
I solved it by practically going around the problem and avoiding
soundfile
.I downloaded the protobuf version of BirdNet model, loaded it and its labels.
Here is some code to get the picture (without some error handling, to simplify its readability):
to get the signal from
librosa.load
with the loaded model I piped ffmpeg toio.BytesIO
with the following:from this point I mimic the
birdnet
package code, modified to serve my use case:I think the main take here is piping ffmpeg output to io.BytesIO buffer to avoid disk IO and can be read by
librosa
.Practically it allows to use all the audio files ffmpeg knows to handle, convert and resample to wav and continue as usual from this point...
Maybe this example is "good enough" for workaround, and you need something solid with predictable error handling.
I think there is a python binding for ffmpeg, but I never used it, so I can't recommend.
(feel free to correct me if I missed something, signal processing is kinda new to me)
The text was updated successfully, but these errors were encountered: