You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of this code architecture is a standard transformer . (embeds ? how to integrate) The other part is vgan - fsq Using existing VLLM layers It seems difficult to support using existing VLLM layers.
What's your difficulty of supporting the model you want?
No response
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
The only issue is the fish tts use FSQ, which output a tuple (int, int) for each step.
But unfortunately this break the fundamental design of vllm: it hard code the output to a single INT
I request this in Q4 ROAD MAP
but seems no response so far...
The model to consider.
https://huggingface.co/fishaudio/fish-speech-1.4
the githup repo:
https://github.com/fishaudio/fish-speech
Part of this code architecture is a standard transformer . (embeds ? how to integrate) The other part is vgan - fsq Using existing VLLM layers It seems difficult to support using existing VLLM layers.
https://github.com/fishaudio/fish-speech/blob/main/fish_speech/models/text2semantic/llama.py
https://github.com/fishaudio/fish-speech/blob/main/fish_speech/models/vqgan/modules/firefly.py
The closest model vllm already supports.
No response
What's your difficulty of supporting the model you want?
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: