Support for o1-like reasoning model (LRMs) #8760

LastRemote · 2025-01-22T10:06:45Z

It seems like we will need separation of reasoning content and the actual text completions to better manage multi-round conversations with reasoning (for example: https://api-docs.deepseek.com/guides/reasoning_model). This may have impact on the current structure and functionality of ChatMessage, StreamingChunk and generators.

My current purposal is to add a new boolean flag or type in both TextContent and StreamingChunk to indicate if this is a part of the reasoning steps. ChatMessage.text should point to the first non-reasoning text content, and we will need to add a new property for ChatMessage.reasoning.

For example, this is how the streaming chunks will be like from a reasoning model:

StreamingChunk(content: <reasoning-delta1>, is_reasoning: true)
StreamingChunk(content: <reasoning-delta2>, is_reasoning: true)
StreamingChunk(content: <completion-delta1>, is_reasoning: false)
StreamingChunk(content: <completion-delta2>, is_reasoning: false)

And user can access the reasoning and completions part using chat_message.reasoning[s] and chat_message.text[s] respectively from the generator output.

The other option is to have a separate reasoning_content field in StreamingChunk and ReasoningContent class in ChatMessage._contents. This is more aligned with the current deepseek-reasoner API but I feel like it is slightly overcomplicated. But I am not exactly sure whether both reasoning_content and content can appear in one SSE chunk.

I did some research today but there are few reasoning models/APIs available to reach a consensus on what reasoning should be like. I feel like it is probably better to start a discussion thread somewhere and explore the options.

The text was updated successfully, but these errors were encountered:

vblagoje · 2025-01-22T12:34:51Z

@LastRemote have you used o1 via API? My hunch is to align with their abstractions as others are likely to follow.
I played yesterday a bit with https://fireworks.ai/models/fireworks/deepseek-r1 and it seems like deepsek-r1 is using <thinking> tags before the actual output. But I'll speak more once I try it via API. I agree with you that it is important to nail this right.

LastRemote · 2025-01-22T12:44:13Z

have you used o1 via API?

Unfortunately no, I do not have access. But according to the documentation, o1 does not support streaming, and the reasoning tokens are not visible in its response at the moment. Deepseek-R1 is probably the groundbreaker here.

Deepseek-r1 model uses special tokens around the CoT content in its raw response. Their API, however, handles it properly in the new reasoning_content field, which I believe it is a good move since different models are definitely going to use different special tokens for reasoning.

vblagoje · 2025-01-22T13:27:52Z

Right now, fireworks puts everything into response.choices[0].message.content with <think> part coming first before regular response.

{'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text="<think>\nFirst, I need to compare the two numbers: 9.11 and 9.8. \n\nTo make an accurate comparison, I should ensure both numbers have the same number of decimal places. Currently, 9.11 has two decimal places, while 9.8 has only one. To align them, I'll convert 9.8 to 9.80.\n\nNow both numbers are: 9.11 and 9.80. \n\nNext, I'll compare the whole number parts of both numbers. In 9.11, the whole number is 9, and in 9.80, the whole number is also 9. Since the whole numbers are equal, I'll move to the decimal parts.\n\nIn 9.11, the decimal part is 0.11. In 9.80, the decimal part is 0.80. \n\nComparing 0.11 and 0.80, it's clear that 0.80 is greater than 0.11. Therefore, 9.80 (which is 9.8) is greater than 9.11.\n</think>\n\nTo determine which number is greater between \\(9.11\\) and \\(9.8\\), follow these steps:\n\n1. **Equalize Decimal Places:**\n   - Convert \\(9.8\\) to have two decimal places: \\(9.80\\)\n\n2. **Compare Whole Numbers:**\n   - Both numbers have the same whole number part: **9**\n\n3. **Compare Decimal Parts:**\n   - **0.11** (from \\(9.11\\)) vs. **0.80** (from \\(9.80\\))\n   - Since \\(0.80 > 0.11\\), \\(9.80\\) is greater than \\(9.11\\).\n\n**Final Answer:** \\(\\boxed{9.8}\\)")], _name=None, _meta={'model': 'accounts/fireworks/models/deepseek-r1', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 382, 'prompt_tokens': 16, 'total_tokens': 398, 'completion_tokens_details': None, 'prompt_tokens_details': None}})]}

marcklingen · 2025-01-22T13:31:49Z

Chiming in here as there is no native rendering for reasoning tokens yet in langfuse (follow up to this comment by @LastRemote

I have not yet seen a stable schema on how reasoning tokens are included in the api response as openai does not return them. Would love to learn from this thread and add better support for it in langfuse as well.

LastRemote · 2025-01-23T10:32:11Z

Here is a collection of APIs that support reasoning models (feel free to add more):

GPT-o1: Reasoning tokens not included in response body
DeepSeek-r1 (official): https://api-docs.deepseek.com/guides/reasoning_model (use reasoning_content for reasoning tokens)
DeepSeek-r1 (fireworks.ai): https://fireworks.ai/models/fireworks/deepseek-r1 (raw output, with <think> and </think> special tokens)
Gemini 2.0: https://ai.google.dev/gemini-api/docs/thinking#stream-model (Gemini always has different API formats from OpenAI; use thought flag as an indicator)

LastRemote · 2025-01-23T10:41:20Z

I have not yet seen a stable schema on how reasoning tokens are included in the api response as openai does not return them.

I agree it's a bit too early to call. But we do not have to wait for a stable schema to support reasoning models. Haystack has its own data structure of ChatMessage and StreamingChunk, and we can make some changes to it before updating the actual generator implementations. Similarly for Langfuse, we can support the currently most popular schema for reasoning tokens, and there is always a chance to extend the support to other formats (just like what you did for Langfuse model usage).

LastRemote · 2025-01-27T06:33:20Z

@vblagoje If there are no other suggestions regarding the purposal, would you mind me creating a draft PR? I believe this is worth assigning a higher priority and a draft PR can potentially reveal more insights.

LastRemote · 2025-01-27T07:39:37Z

Update: I think we can conclude that reasoning tokens will not contain tool call/tool call result information, at least not in the standard/current way.

I have found some research projects that attempt to add some sort of tool support in the reasoning process (e.g. https://arxiv.org/abs/2501.05366), but there is currently no way to return a partial reasoning (with no final content) in the current chat API schema. That being said, due to the limitation of current APIs, we can expect the LRM to return both the complete reasoning and the final content in the same response, and reasoning tc/tcr must be handled by the decoding process on the server side and in a completely different format (due to the fact that these "tools" run inside of the model decoding loop).

IMO it is safe to say that reasoning_content will remain a single string (perhaps there might be multimodal contents that I am not sure?) for now, until there are new concepts (like reasoning_tool_calls which is different from the standard tool_calls) that build around it.

vblagoje · 2025-01-27T09:57:38Z

Sounds good @LastRemote - I think it makes sense to perhaps add a property reasoning_text etc to ChatMessage as we recently added text, tools_calls, tool_call_results properties to reflect a growing bandwidth of ChatMessages. Let's also involve @anakin87 as he originally worked on these ChatMessage expansions. Thoughts @anakin87 ?

LastRemote · 2025-01-27T10:01:29Z

it makes sense to perhaps add a property reasoning_text etc to ChatMessage

I totally agree. This is actually what I am doing right now, adding a new field _reasoning_content: Optional[TextContent] to ChatMessage.

LastRemote · 2025-01-27T14:07:54Z

I created an untested draft PR to illustrate the purposed changes. I will be away for the rest of the week, but feel free to take over.

LastRemote mentioned this issue Jan 22, 2025

feat: Add custom Langfuse span handling support deepset-ai/haystack-core-integrations#1313

Merged

LastRemote linked a pull request Jan 27, 2025 that will close this issue

[DRAFT] feat: support deepseek-reasoner #8776

Open

julian-risch added the P3 Low priority, leave it in the backlog label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for o1-like reasoning model (LRMs) #8760

Support for o1-like reasoning model (LRMs) #8760

LastRemote commented Jan 22, 2025

vblagoje commented Jan 22, 2025 •

edited

Loading

LastRemote commented Jan 22, 2025

vblagoje commented Jan 22, 2025

marcklingen commented Jan 22, 2025 •

edited

Loading

LastRemote commented Jan 23, 2025 •

edited

Loading

LastRemote commented Jan 23, 2025

LastRemote commented Jan 27, 2025 •

edited

Loading

LastRemote commented Jan 27, 2025 •

edited

Loading

vblagoje commented Jan 27, 2025

LastRemote commented Jan 27, 2025

LastRemote commented Jan 27, 2025

Support for o1-like reasoning model (LRMs) #8760

Support for o1-like reasoning model (LRMs) #8760

Comments

LastRemote commented Jan 22, 2025

vblagoje commented Jan 22, 2025 • edited Loading

LastRemote commented Jan 22, 2025

vblagoje commented Jan 22, 2025

marcklingen commented Jan 22, 2025 • edited Loading

LastRemote commented Jan 23, 2025 • edited Loading

LastRemote commented Jan 23, 2025

LastRemote commented Jan 27, 2025 • edited Loading

LastRemote commented Jan 27, 2025 • edited Loading

vblagoje commented Jan 27, 2025

LastRemote commented Jan 27, 2025

LastRemote commented Jan 27, 2025

vblagoje commented Jan 22, 2025 •

edited

Loading

marcklingen commented Jan 22, 2025 •

edited

Loading

LastRemote commented Jan 23, 2025 •

edited

Loading

LastRemote commented Jan 27, 2025 •

edited

Loading

LastRemote commented Jan 27, 2025 •

edited

Loading