Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] XGrammar causes gibberish during parallel execution and cuts off other requests #2414

Open
5 tasks done
remixer-dec opened this issue Dec 9, 2024 · 2 comments
Open
5 tasks done

Comments

@remixer-dec
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

When multiple requests hit SGlang's chat completions endpoint and one of them has a JSON schema and the other one doesn't, the one without a schema will output some random tokens, if there are multiple JSON schemas, at least one of them will get cutoff or overflow. Outlines backend works fine.
noschema



schema

schema2

Reproduction

  1. Run sglang openai server on localhost:30000 with any model with --grammar-backend xgrammar
  2. Open a browser tab and go to http://localhost:30000/docs
  3. Press F12 and Run the requests in browser console
async function fetchChatCompletions() {
    const endpoint = "http://localhost:30000/v1/chat/completions";
    const headers = { "Content-Type": "application/json", "Authorization": "Bearer YOUR_API_KEY"  };
    const question = "Can you provide a detailed plan for building a sustainable city including energy, transportation, and waste management systems?";

    const schemas = [
        null, // No schema for the first request
        { type: "object", properties: { energy: { type: "string" }, transportation: { type: "string" }, wasteManagement: { type: "string" } }, required: ["energy", "transportation", "wasteManagement"] },
        { type: "object", properties: { urbanPlanning: { type: "string" }, sustainability: { type: "string" }, publicServices: { type: "string" } }, required: ["urbanPlanning", "sustainability", "publicServices"] }
    ];

    const requests = schemas.map(schema => ({
        model: "local",
        messages: [{ role: "user", content: question }],
        ...(schema && { response_format: { type: "json_schema", json_schema: { name: "city_plan", strict: true, schema } } })
    }));

    try {
        const [res1, res2, res3] = await Promise.all(requests.map(req => fetch(endpoint, { method: "POST", headers, body: JSON.stringify(req) })));
        console.log("Response 1:", await res1.json());
        console.log("Response 2:", await res2.json());
        console.log("Response 3:", await res3.json());
    } catch (error) {
        console.error("Error:", error);
    }
}

fetchChatCompletions();

Environment

Python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:27:36) [GCC 13.3.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 4090
GPU 0 Compute Capability: 8.9
CUDA_HOME: None
PyTorch: 2.5.1+cu124
sglang: 0.4.0.post1
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.47.0
torchao: 0.6.1
numpy: 1.26.4
aiohttp: 3.10.11
fastapi: 0.115.6
hf_transfer: 0.1.8
huggingface_hub: 0.26.5
interegular: 0.3.3
modelscope: 1.21.0
orjson: 3.10.12
packaging: 24.1
psutil: 6.1.0
pydantic: 2.9.2
multipart: 0.0.19
zmq: 26.2.0
uvicorn: 0.32.1
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.51.2
anthropic: 0.40.0
decord: 0.6.0
NVIDIA Topology: 
        GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-23    0               N/A

xgrammar==0.1.6
@Ubospica
Copy link
Contributor

Ubospica commented Dec 9, 2024

@remixer-dec Thanks for reporting the bug! See also #2216 (comment). We will fix that soon.

@alanxmay
Copy link

@Ubospica Hi, sorry to bother you, but is there any progress? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants