-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for o1-like reasoning model (LRMs) #8760
Comments
@LastRemote have you used o1 via API? My hunch is to align with their abstractions as others are likely to follow. |
Unfortunately no, I do not have access. But according to the documentation, o1 does not support streaming, and the reasoning tokens are not visible in its response at the moment. Deepseek-R1 is probably the groundbreaker here. Deepseek-r1 model uses special tokens around the CoT content in its raw response. Their API, however, handles it properly in the new |
Right now, fireworks puts everything into
|
Chiming in here as there is no native rendering for reasoning tokens yet in langfuse (follow up to this comment by @LastRemote I have not yet seen a stable schema on how reasoning tokens are included in the api response as openai does not return them. Would love to learn from this thread and add better support for it in langfuse as well. |
Here is a collection of APIs that support reasoning models (feel free to add more): GPT-o1: Reasoning tokens not included in response body |
I agree it's a bit too early to call. But we do not have to wait for a stable schema to support reasoning models. Haystack has its own data structure of |
@vblagoje If there are no other suggestions regarding the purposal, would you mind me creating a draft PR? I believe this is worth assigning a higher priority and a draft PR can potentially reveal more insights. |
Update: I think we can conclude that reasoning tokens will not contain tool call/tool call result information, at least not in the standard/current way. I have found some research projects that attempt to add some sort of tool support in the reasoning process (e.g. https://arxiv.org/abs/2501.05366), but there is currently no way to return a partial reasoning (with no final content) in the current chat API schema. That being said, due to the limitation of current APIs, we can expect the LRM to return both the complete reasoning and the final content in the same response, and reasoning tc/tcr must be handled by the decoding process on the server side and in a completely different format (due to the fact that these "tools" run inside of the model decoding loop). IMO it is safe to say that |
Sounds good @LastRemote - I think it makes sense to perhaps add a property |
I totally agree. This is actually what I am doing right now, adding a new field |
I created an untested draft PR to illustrate the purposed changes. I will be away for the rest of the week, but feel free to take over. |
It seems like we will need separation of reasoning content and the actual text completions to better manage multi-round conversations with reasoning (for example: https://api-docs.deepseek.com/guides/reasoning_model). This may have impact on the current structure and functionality of
ChatMessage
,StreamingChunk
and generators.My current purposal is to add a new boolean flag or type in both
TextContent
andStreamingChunk
to indicate if this is a part of the reasoning steps.ChatMessage.text
should point to the first non-reasoning text content, and we will need to add a new property forChatMessage.reasoning
.For example, this is how the streaming chunks will be like from a reasoning model:
And user can access the reasoning and completions part using
chat_message.reasoning[s]
andchat_message.text[s]
respectively from the generator output.The other option is to have a separate
reasoning_content
field inStreamingChunk
andReasoningContent
class inChatMessage._contents
. This is more aligned with the current deepseek-reasoner API but I feel like it is slightly overcomplicated. But I am not exactly sure whether bothreasoning_content
andcontent
can appear in one SSE chunk.I did some research today but there are few reasoning models/APIs available to reach a consensus on what reasoning should be like. I feel like it is probably better to start a discussion thread somewhere and explore the options.
The text was updated successfully, but these errors were encountered: