-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] (v0.3.6.post2) Output degredation when using structured output #2216
Comments
cc @merrymercy |
Thanks for the reporting this. I can reproduce this error. The problem is that this model works better with multi-line style JSON but the default argument in sglang uses single-line style. We can fix this for both outlines and xgrammar backend. Fix for outlines backend (works)You can add
Output
Fix for xgrammar backend (does not work)Try this commit dd4482e
Output
However, this fix does not work. I think there are some subtle details on how it handle the whitespace @Ubospica . |
I am also facing a wierd issue when using xgrammar as backend. Not sure if this is related. I am using document prefix caching to do multiple extractions at the same time. Some of them use structured json output. And some of them outputs plain text. When using xgrammar using sgl.gen with json_schema, the outputs in plain text are changing, and sometimes not even terminating. Wierd thing is the plain text is not using json_schema. While using outlines, its working as expected. Isnt sgl.gen with json_schema and xgrammar not supported yet? Thanks, |
I also observe in my json output tests with around 1500 requests that the xgrammar backend allows for infinite generation. It also does not adhere to the api max_tokens limit and continues generating until memory overflow occurs, after which the generation is abruptly cut off. This does not happen with outlines. In my test, there are also requests with and without a json schema in the same batch. |
@arunpatala @Swipe4057 any minimal small reproducible examples will be very helpful here. The developers of grammar @Ubospica are here ready to help if the bugs can be easily reproduced. |
@merrymercy Hi, thanks for your advice. I pulled the latest version |
Hi @Quang-elec44, thanks for pointing that out! For XGrammar, we found the reason is XGrammar requires LLM to generate strictly formatted JSON. This is not strictly formatted JSON (the array is compressed in one line):
This is strictly formatted:
However, this strict requirement sometimes makes LLM's output quality deteriorate. In this case, the LLM will generate
which is still strictly formatted, but not meaningful. We will relax this restriction and allow non-strictly formatted json in the real case in a recent version to ensure the output quality. |
@merrymercy It seems I was able to understand a bit more in detail; the issue isn't with xgrammar, but rather that my LLM doesn't generate stop tokens under certain parameters. |
Is there a way to format the json data for finetuning to follow the format of xgrammar? |
response_format.json_schema Input should be a valid dictionary or instance of JsonSchemaResponseFormat [type=model_type, input_value='{\n "$schema": "http://...\n "type": "object"\n}', input_type=str] Looks like it is not possible to pass schema as a string anymore. Ok fine, I pass it as object, but still get an error response_format.json_schema.name Field required [type=missing, input_value={'$schema': 'http://json-...ema#', 'type': 'object'}, input_type=dict]Why does a schema require a name??? Even after adding the name, the schema was fully ignored and the generated output was not following the schema at all without any errors in the logs: UPD: seems like in my case the issue was that the schema should be inside json_shema object and not json_shema itself, whoops, I totally missed this change. |
The problem of XGrammar mentioned above (#2216 (comment)) is solved in XGrammar v0.1.6. It is also updated in SGLang (#2390). This should solve the issue mentioned by @Quang-elec44. |
I tried to reproduce the error when i am mixing structured generation with normal generation using sgl forks. abstract = """
Computer Science > Computer Vision and Pattern Recognition
[Submitted on 5 Dec 2024]
PaintScene4D: Consistent 4D Scene Generation from Text Prompts
Vinayak Gupta, Yunze Man, Yu-Xiong Wang
Recent advances in diffusion models have revolutionized 2D and 3D content creation, yet generating photorealistic dynamic 4D scenes remains a significant challenge. Existing dynamic 4D generation methods typically rely on distilling knowledge from pre-trained 3D generative models, often fine-tuned on synthetic object datasets. Consequently, the resulting scenes tend to be object-centric and lack photorealism. While text-to-video models can generate more realistic scenes with motion, they often struggle with spatial understanding and provide limited control over camera viewpoints during rendering. To address these limitations, we present PaintScene4D, a novel text-to-4D scene generation framework that departs from conventional multi-view generative models in favor of a streamlined architecture that harnesses video generative models trained on diverse real-world datasets. Our method first generates a reference video using a video generation model, and then employs a strategic camera array selection for rendering. We apply a progressive warping and inpainting technique to ensure both spatial and temporal consistency across multiple viewpoints. Finally, we optimize multi-view images using a dynamic renderer, enabling flexible camera control based on user preferences. Adopting a training-free architecture, our PaintScene4D efficiently produces realistic 4D scenes that can be viewed from arbitrary trajectories. The code will be made publicly available. Our project page is at this https URL
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2412.04471 [cs.CV]
(or arXiv:2412.04471v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2412.04471
Focus to learn more
Submission history
From: Yunze Man [view email]
[v1] Thu, 5 Dec 2024 18:59:57 UTC (7,775 KB)
"""
from pydantic import BaseModel
from typing import List, Dict
class SubmissionHistory(BaseModel):
version: str
date: str
file_size: str
class Metadata(BaseModel):
comments: str
submission_history: SubmissionHistory
doi: str
arxiv_id: str
class Abstract(BaseModel):
title: str
authors: List[str]
submission_date: str
categories: List[str]
summary: str
metadata: Metadata
import sglang as sgl
abstract_prompt = """
**ABSTRACT**
{}
"""
abstract_instruction = """
**INSTRUCTION:**
"Parse the provided abstract and metadata of a research paper into JSON format with the following structure:
1. Include the `title`, `authors`, `submission_date`, and `categories` as direct fields.
2. Exclude the `abstract` field.
3. Add a `summary` field that concisely explains the core idea, methodology, and significance of the paper.
4. Retain a `metadata` field containing any additional details such as `comments`, `submission_history`, `doi`, and `arxiv_id`.
Ensure the JSON is well-structured and adheres to the specified format."
"""
@sgl.function
def gets(s, abstract, keys):
s += sgl.user_begin()
s += abstract_prompt.format(abstract)
forks = s.fork(1 + len(keys))
forks[0] += abstract_instruction + sgl.user_end()
forks[0] += sgl.assistant(sgl.gen("response", json_schema=Abstract.schema_json(), max_tokens=1024, temperature=0.0))
for k,f in zip(keys, forks[1:]):
f += f"Extract the following field: {k}" + sgl.user_end()
f += sgl.assistant(sgl.gen("response", max_tokens=1024,
temperature=0.0))
forks.join()
s["return1"] = forks[0]["response"]
s["return2"] = [f["response"] for f in forks[1:]]
sgl.set_default_backend(sgl.RuntimeEndpoint("http://localhost:30000"))
keys = ['title', 'authors', 'summary']
states = gets.run(abstract, keys)
for i,k in zip(keys, states["return2"]):
print(i, ":")
print(k) I am running sglang server with
I tried with both the xgrammar v0.1.5 and v0.1.6 with latest main branch in docker. I am getting the following output for outlines at temperature 0.0 and 0.25
But the output of xgrammar at especially lower temperatures are not generated properly.
Also xgrammar is generating "assistant" keyword at temperature 0.25 sometimes or not generating at all. The structured output seems to be all right for both grammar backends. Hope this helps to reproduce the bug. Let me know if any other information is needed. |
@arunpatala Thanks for producing the reproducible script! I think that should be a problem in the mask application process where the mask is applied on non-structured requests. We will fix that problem soon. |
@Ubospica thanks for looking into it. |
Checklist
Describe the bug
The results (w and w/o JSON schema) are different, while those generated from
vllm
server (v0.6.4.post1) remain the sameReproduction
How to start
sglang
serverHow to start
vllm
serverPython script
Results without json_schema
vllm
sglang
Results with json_schema
vllm
sglang (xgrammar backend)
sglang (outlines backend)
Environment
The text was updated successfully, but these errors were encountered: