Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper ASR is broken #52

Closed
jack2game opened this issue Mar 18, 2024 · 13 comments · Fixed by #55
Closed

Whisper ASR is broken #52

jack2game opened this issue Mar 18, 2024 · 13 comments · Fixed by #55
Labels
bug Something isn't working

Comments

@jack2game
Copy link

This plugin cannot work with the Whisper ASR Webservice anymore. It sends audio file to the docker, but cannot pick up the response properly.

@djmango
Copy link
Owner

djmango commented Mar 19, 2024

Has the Whisper ASR response format changed? If so, can submit a PR or use the ASR docker image versioned before the format change

@quantology
Copy link

quantology commented Mar 26, 2024

I might be seeing the same issue:

Error with URL: http://localhost:9000/asr?output=json&word_timestamps=true&language=en TypeError: Cannot read properties of undefined (reading 'map')

I've replicated the above error on Whisper ASR versions 1.3.0 (latest) and also 1.2.4 and 1.1.1. It seems to be failing here. It looks like the code above expects rawResponse.segments to be an array of arrays, whereas I'm seeing it as Segment[], where Segment is like:

interface Segment {
  avg_logprob: number;
  compression_ratio: number;
  end: number;
  id: number;
  no_speech_prob: number;
  seek: number; 
  start: number;
  temperature: number;
  text: string;
  tokens: number[];
}

I'm happy to revert to a previous version of the docker image -- does anyone know what the latest functional version of whisper-asr-webservice is?

@Xeelix
Copy link

Xeelix commented Apr 1, 2024

I might be seeing the same issue:

Error with URL: http://localhost:9000/asr?output=json&word_timestamps=true&language=en TypeError: Cannot read properties of undefined (reading 'map')

I've replicated the above error on Whisper ASR versions 1.3.0 (latest) and also 1.2.4 and 1.1.1. It seems to be failing here. It looks like the code above expects rawResponse.segments to be an array of arrays, whereas I'm seeing it as Segment[], where Segment is like:

interface Segment {
  avg_logprob: number;
  compression_ratio: number;
  end: number;
  id: number;
  no_speech_prob: number;
  seek: number; 
  start: number;
  temperature: number;
  text: string;
  tokens: number[];
}

I'm happy to revert to a previous version of the docker image -- does anyone know what the latest functional version of whisper-asr-webservice is?

Hey! Were you able to solve the problem?

@djmango djmango added the bug Something isn't working label Apr 4, 2024
@dahifi
Copy link
Contributor

dahifi commented Apr 10, 2024

This is definitely a regression w v3. I haven't touched my ASR in months and it broke completely today following Obsidian updates. Of course I'm using a WhisperX fork which may be out of date, but assuming backward-compatible API....

I think it's related to the timestamps functionality. AFAIK, there's no TS flag in the ASR endpoint, so this is probably causing the ASR to throw an error response.
image

I have the timestamp setting turned off, but the URL being generated still has the flag. The proper behavior here should be not to include the parameter at all: 'http://192.168.1.201:9000/asr?output=json&word_timestamps=true&encode=false'

@djmango looks like this is a bug on your end?

@djmango
Copy link
Owner

djmango commented Apr 10, 2024

Ah in that case this might be related to #50 @bscholer

@djmango
Copy link
Owner

djmango commented Apr 10, 2024

Thanks for investigating @dahifi - i will corroborate your findings and release an update later today accordingly

@dahifi
Copy link
Contributor

dahifi commented Apr 10, 2024

I found the line here: https://github.com/djmango/obsidian-transcription/pull/50/files#diff-0f4208f0163c212f445df35fc43b99ceb09432700a33b38b514883c99c7d6169R131

I'm going to grab lunch then would like to do the PR myself, kind ser.

@bscholer
Copy link
Contributor

I haven't looked close at the code yet, but this does seem like it's due to my PR #50. Will leave for @dahifi to patch unless you'd like me to!

@djmango
Copy link
Owner

djmango commented Apr 10, 2024

Gotcha, much appreciated

dahifi added a commit to dahifi/obsidian-transcription that referenced this issue Apr 10, 2024
@dahifi
Copy link
Contributor

dahifi commented Apr 10, 2024

WhisperASR expects the file to be a multipart form, e.g.

curl -X 'POST' \
  'http://192.168.1.201:9000/asr?task=transcribe&encode=true&output=json&diarize=false' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'audio_file=@Recording 20231008113030.webm;type=video/webm'

Where it looks like you're using octet-streams in the request body?

I've had bad luck trying to get things working in my dev notebook. I went back through all the various commits that modified that function and finally wound up on the last contribution I did back in October, which should be v.3.1.6.

For now I recommend ASR users load up that version, I'm not sure I want to spend more time on it. DJ may want to reconsider supporting it since Swiftlink has diverged from it so much now. Might just want to make a note in the readme and leave it at that.

@bscholer
Copy link
Contributor

It looks like this issue does not occur when using ASR_ENGINE=faster_whisper. Seems like there is an inconsistency between the engines in https://github.com/ahmetoner/whisper-asr-webservice

@bscholer
Copy link
Contributor

Doing further investigation, it looks like ASR_ENGINE=openai_whisper returns a structure that looks like this (notice the named fields in the segments array):
image

ASR_ENGINE=faster_whisper instead returns a structure that looks like this (notice the lack of named fields). Although I hate this format, this is what I wrote the code to expect.
image

I'll open up an issue on https://github.com/ahmetoner/whisper-asr-webservice to try to standardize this, but for now, I'll just tweak the code to handle both so we can get this issue fixed.

@djmango
Copy link
Owner

djmango commented Apr 12, 2024

Update released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
6 participants