-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sound detection - Is this possible? (TFMIC-16) #74
Comments
@gamename going by the size of the model, it is a quantised model I believe. If not, I would suggest you to quantise it to int8 weights. That'll reduce the size of the model to about 1/4th. Have you tried it with About SD card, unfortunately, I have not really tried this approach. It definitely is a worth of a try IMO. When loading from the SD card however, it makes sense to not convert to .cc as you suggest. Let me know how it goes. If you need further help or want me to try, do let me know. |
What process did you use to build the The reason I ask is the C array in Thanks |
Thank you, sir. |
Is your pre-processor model taken from here? The reason I ask is because it seems the pre-processor should work for "meow" as well as human speech. It just generates spectrograms. That's payload agnostic (i.e., it just makes a spectrogram of a sound and doesn't care what sound it is). Correct? Thanks, |
@gamename that's right, the model is taken from that particular location. |
Perfect. Thanks. |
for |
@gamename you are correct. Those were there from old days added for testing and are not used currently. You may ignore those. |
Thanks! |
This concerns building the actual model. I am using a script here that is just a compilation of the steps outlined here. Here is what my input dir with samples looks like: tree ./samples
./samples
├── _background_noise_
│ ├── README.md
│ ├── doing_the_dishes.wav
│ ├── dude_miaowing.wav
│ ├── exercise_bike.wav
│ ├── pink_noise.wav
│ ├── running_tap.wav
│ └── white_noise.wav
└── meow
├── cat0001.wav
├── cat0002.wav
... (there are 77 total cat I'm confused about what needs to be in there. Do I need to add a Thanks |
Another question. :) Looking at this construct:
|
Hello TennisSmith,
That completely depends on the model trained.
It cannot be inferred from the model what categories are. Only the number of categories can be known from output tensor size.
Thanks.,
Vikram
On 14-Mar-2024, at 3:17 AM, Tennis Smith ***@***.***> wrote:
[External: This email originated outside Espressif]
@vikramdattu<https://github.com/vikramdattu>
Another question. :)
Looking at this construct<https://github.com/espressif/esp-tflite-micro/blob/61af88b7b30fda2078a7b52b2c1b600899a73e2e/examples/micro_speech/main/micro_model_settings.h#L31>:
constexpr int kCategoryCount = 4;
constexpr const char* kCategoryLabels[kCategoryCount] = {
"silence",
"unknown",
"yes",
"no",
};```
...how do you know what the order of the labels ("silence", "unknown", etc) should be? How is that set?
—
Reply to this email directly, view it on GitHub<#74 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABKBURYJORLN5LS6HFAZIN3YYDCN7AVCNFSM6AAAAABD3ASOXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVHEYTSNBXHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
That's not quite what I am asking. :) My question is this: How do I know the order of the labels as they are used in python after the model has been created? |
Hi,
I'm using an esp32-s3-eye v2.2. It has 8MB each of flash and PSRAM. Is it possible to use
yamnet.tflite
on an esp32-s3-eye v2.2 for sound identification? Theyamnet.tflite
file is about 3.9M in size.The chip has an sd card slot, so I can use it to load the model file (i.e. no need to convert it to a
.cc
file withxxd
.Thoughts?
The text was updated successfully, but these errors were encountered: