Whisper ASR is broken #52

jack2game · 2024-03-18T13:35:12Z

This plugin cannot work with the Whisper ASR Webservice anymore. It sends audio file to the docker, but cannot pick up the response properly.

djmango · 2024-03-19T16:48:16Z

Has the Whisper ASR response format changed? If so, can submit a PR or use the ASR docker image versioned before the format change

quantology · 2024-03-26T02:32:37Z

I might be seeing the same issue:

Error with URL: http://localhost:9000/asr?output=json&word_timestamps=true&language=en TypeError: Cannot read properties of undefined (reading 'map')

I've replicated the above error on Whisper ASR versions 1.3.0 (latest) and also 1.2.4 and 1.1.1. It seems to be failing here. It looks like the code above expects rawResponse.segments to be an array of arrays, whereas I'm seeing it as Segment[], where Segment is like:

interface Segment {
  avg_logprob: number;
  compression_ratio: number;
  end: number;
  id: number;
  no_speech_prob: number;
  seek: number; 
  start: number;
  temperature: number;
  text: string;
  tokens: number[];
}

I'm happy to revert to a previous version of the docker image -- does anyone know what the latest functional version of whisper-asr-webservice is?

Xeelix · 2024-04-01T12:32:29Z

I might be seeing the same issue:
Error with URL: http://localhost:9000/asr?output=json&word_timestamps=true&language=en TypeError: Cannot read properties of undefined (reading 'map')
I've replicated the above error on Whisper ASR versions 1.3.0 (latest) and also 1.2.4 and 1.1.1. It seems to be failing here. It looks like the code above expects rawResponse.segments to be an array of arrays, whereas I'm seeing it as Segment[], where Segment is like:
interface Segment {
  avg_logprob: number;
  compression_ratio: number;
  end: number;
  id: number;
  no_speech_prob: number;
  seek: number; 
  start: number;
  temperature: number;
  text: string;
  tokens: number[];
}
I'm happy to revert to a previous version of the docker image -- does anyone know what the latest functional version of whisper-asr-webservice is?

Hey! Were you able to solve the problem?

dahifi · 2024-04-10T14:56:45Z

This is definitely a regression w v3. I haven't touched my ASR in months and it broke completely today following Obsidian updates. Of course I'm using a WhisperX fork which may be out of date, but assuming backward-compatible API....

I think it's related to the timestamps functionality. AFAIK, there's no TS flag in the ASR endpoint, so this is probably causing the ASR to throw an error response.

I have the timestamp setting turned off, but the URL being generated still has the flag. The proper behavior here should be not to include the parameter at all: 'http://192.168.1.201:9000/asr?output=json&word_timestamps=true&encode=false'

@djmango looks like this is a bug on your end?

djmango · 2024-04-10T14:59:15Z

Ah in that case this might be related to #50 @bscholer

djmango · 2024-04-10T15:00:46Z

Thanks for investigating @dahifi - i will corroborate your findings and release an update later today accordingly

dahifi · 2024-04-10T15:03:42Z

I found the line here: https://github.com/djmango/obsidian-transcription/pull/50/files#diff-0f4208f0163c212f445df35fc43b99ceb09432700a33b38b514883c99c7d6169R131

I'm going to grab lunch then would like to do the PR myself, kind ser.

bscholer · 2024-04-10T15:04:27Z

I haven't looked close at the code yet, but this does seem like it's due to my PR #50. Will leave for @dahifi to patch unless you'd like me to!

djmango · 2024-04-10T15:04:37Z

Gotcha, much appreciated

Not supported by WhisperASR. Fixes djmango#52

dahifi · 2024-04-10T16:38:31Z

WhisperASR expects the file to be a multipart form, e.g.

curl -X 'POST' \
  'http://192.168.1.201:9000/asr?task=transcribe&encode=true&output=json&diarize=false' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'audio_file=@Recording 20231008113030.webm;type=video/webm'

Where it looks like you're using octet-streams in the request body?

I've had bad luck trying to get things working in my dev notebook. I went back through all the various commits that modified that function and finally wound up on the last contribution I did back in October, which should be v.3.1.6.

For now I recommend ASR users load up that version, I'm not sure I want to spend more time on it. DJ may want to reconsider supporting it since Swiftlink has diverged from it so much now. Might just want to make a note in the readme and leave it at that.

bscholer · 2024-04-10T21:06:52Z

It looks like this issue does not occur when using ASR_ENGINE=faster_whisper. Seems like there is an inconsistency between the engines in https://github.com/ahmetoner/whisper-asr-webservice

bscholer · 2024-04-10T21:16:55Z

Doing further investigation, it looks like ASR_ENGINE=openai_whisper returns a structure that looks like this (notice the named fields in the segments array):

ASR_ENGINE=faster_whisper instead returns a structure that looks like this (notice the lack of named fields). Although I hate this format, this is what I wrote the code to expect.

I'll open up an issue on https://github.com/ahmetoner/whisper-asr-webservice to try to standardize this, but for now, I'll just tweak the code to handle both so we can get this issue fixed.

fixes djmango#52

djmango · 2024-04-12T16:56:03Z

Update released

djmango added the bug Something isn't working label Apr 4, 2024

djmango mentioned this issue Apr 4, 2024

[Whisper Asr] Cannot transcript audio #53

Closed

dahifi added a commit to dahifi/obsidian-transcription that referenced this issue Apr 10, 2024

Remove word_timestamp from getTranscriptionWhisperASR

48d123b

Not supported by WhisperASR. Fixes djmango#52

dahifi mentioned this issue Apr 10, 2024

hotfix[whsiperASR]: Remove word_timestamp from getTranscriptionWhisperASR #54

Closed

bscholer mentioned this issue Apr 10, 2024

faster_whisper and openai_whisper have differently formatted responses ahmetoner/whisper-asr-webservice#209

Open

bscholer added a commit to bscholer/obsidian-transcription that referenced this issue Apr 11, 2024

add support for newer openai_whisper engine responses in whisper asr

f8350b4

fixes djmango#52

bscholer mentioned this issue Apr 11, 2024

Fix for new Whisper ASR response format #55

Merged

djmango closed this as completed in #55 Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper ASR is broken #52

Whisper ASR is broken #52

jack2game commented Mar 18, 2024

djmango commented Mar 19, 2024

quantology commented Mar 26, 2024 •

edited

Loading

Xeelix commented Apr 1, 2024

dahifi commented Apr 10, 2024

djmango commented Apr 10, 2024

djmango commented Apr 10, 2024

dahifi commented Apr 10, 2024

bscholer commented Apr 10, 2024

djmango commented Apr 10, 2024

dahifi commented Apr 10, 2024 •

edited

Loading

bscholer commented Apr 10, 2024

bscholer commented Apr 10, 2024

djmango commented Apr 12, 2024

Whisper ASR is broken #52

Whisper ASR is broken #52

Comments

jack2game commented Mar 18, 2024

djmango commented Mar 19, 2024

quantology commented Mar 26, 2024 • edited Loading

Xeelix commented Apr 1, 2024

dahifi commented Apr 10, 2024

djmango commented Apr 10, 2024

djmango commented Apr 10, 2024

dahifi commented Apr 10, 2024

bscholer commented Apr 10, 2024

djmango commented Apr 10, 2024

dahifi commented Apr 10, 2024 • edited Loading

bscholer commented Apr 10, 2024

bscholer commented Apr 10, 2024

djmango commented Apr 12, 2024

quantology commented Mar 26, 2024 •

edited

Loading

dahifi commented Apr 10, 2024 •

edited

Loading