-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for realtime API #3714
Comments
A good candidate VAD library: https://github.com/snakers4/silero-vad/tree/master/examples/go |
I started looking at stubbing out the api, it's mostly just json, curious why you are suggesting the grpc-websocket-proxy? |
I was digging a bit into projects that are interfacing with grpc and websockets - was just adding some code/notes here to pick up brain with, very preliminar search, that might be useful as reference/getting some ideas from |
Heard about that one btw : https://github.com/wavey-ai/mel-spec?tab=readme-ov-file Also copy pasting maybe useful ressources I linked in another repo, as 4o will not be the only one to support this if we want to bot rely too much on openai's code : i saw on hackernews that agents by livekits used to make the openai realtime api as well as cerebras voice seems to be open source. They have tons of demos and code on their github. I think there must be a llama-omni implementation somewhere that would be a killer feature for open-webui! Here's a particularly interesting demo that connects stt + llm + tts: https://github.com/livekit/agents/blob/main/examples/voice-pipeline-agent/minimal_assistant.py I made an issue to ask for a demo for Llama-Omni, also for kyutai's moshi model. There's also model's moshi implementation : https://github.com/modal-labs/quillman |
I'm trying to build the server implementation based on openai spec for their Realtime API. |
There is a WIP branch over here : #3722 Contribution and feedbacks always welcome! |
Is your feature request related to a problem? Please describe.
OpenAI just extended their API with realtime support with web sockets
https://openai.com/index/introducing-the-realtime-api/?s=09
Describe the solution you'd like
LocalAI should support backends with voice capabilities and introduce a compatible API endpoint with OpenAI clients.
Ideally it should support also function calling as OpenAI does:
Seems that also Chat completion API is gonna have audio output/input too, but API specs are not available yet:
Describe alternatives you've considered
Additional context
#3602
#3722
API docs: https://platform.openai.com/docs/guides/realtime https://platform.openai.com/docs/api-reference/realtime-client-events/session-update
https://github.com/tmc/grpc-websocket-proxy
https://github.com/openconfig/grpctunnel
https://github.com/mudler/LocalAI/tree/feat/realtime
open source models that can handle realtime speech:
The text was updated successfully, but these errors were encountered: