Mfcc feature dimensions #246

JRMeyer · 2021-03-08T01:23:33Z

JRMeyer
Mar 8, 2021
Maintainer

>>> shahdloo
[July 14, 2018, 12:33pm]

In the documentation for ''audiofile_to_input_vector'' function it reads
that ''MFCC features
at every 0.01s time step with a window length of 0.025s'' are
calculated. I tried to confirm this statement.
I have a 16kHz wav file containing 9631014 samples. the MFCC features I
get from the ''audiofile_to_input_vector'' function have dimension
30097 /*494 which I read as [9631014/320 ]

[26+2 / *26 / *9 /].
I conclude that 494 MFCC features are extracted for every 320 samples
which results in 0.02s time steps. Is my reasoning correct? So is this
really 0.02s time step instead of 0.01s?

[This is an archived TTS discussion thread from discourse.mozilla.org/t/mfcc-feature-dimensions]

JRMeyer · 2021-03-08T01:23:36Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> shahdloo
[July 16, 2018, 9:06pm]

Figured out the answer. This is due to the parameter ''BiRNN stride =
2'' which keeps every other feature sample resulting in 0.02s actual
time step.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T01:23:38Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> reuben
[July 16, 2018, 9:57pm]

Yep! We should probably experiment with computing features over 20ms
windows instead of using the stride to see how it performs...

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mfcc feature dimensions #246

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Mfcc feature dimensions #246

JRMeyer Mar 8, 2021 Maintainer

Replies: 2 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author