[Bug] XGrammar causes gibberish during parallel execution and cuts off other requests #2414

remixer-dec · 2024-12-09T02:33:44Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

When multiple requests hit SGlang's chat completions endpoint and one of them has a JSON schema and the other one doesn't, the one without a schema will output some random tokens, if there are multiple JSON schemas, at least one of them will get cutoff or overflow. Outlines backend works fine.

Reproduction

Run sglang openai server on localhost:30000 with any model with --grammar-backend xgrammar
Open a browser tab and go to http://localhost:30000/docs
Press F12 and Run the requests in browser console

async function fetchChatCompletions() {
    const endpoint = "http://localhost:30000/v1/chat/completions";
    const headers = { "Content-Type": "application/json", "Authorization": "Bearer YOUR_API_KEY"  };
    const question = "Can you provide a detailed plan for building a sustainable city including energy, transportation, and waste management systems?";

    const schemas = [
        null, // No schema for the first request
        { type: "object", properties: { energy: { type: "string" }, transportation: { type: "string" }, wasteManagement: { type: "string" } }, required: ["energy", "transportation", "wasteManagement"] },
        { type: "object", properties: { urbanPlanning: { type: "string" }, sustainability: { type: "string" }, publicServices: { type: "string" } }, required: ["urbanPlanning", "sustainability", "publicServices"] }
    ];

    const requests = schemas.map(schema => ({
        model: "local",
        messages: [{ role: "user", content: question }],
        ...(schema && { response_format: { type: "json_schema", json_schema: { name: "city_plan", strict: true, schema } } })
    }));

    try {
        const [res1, res2, res3] = await Promise.all(requests.map(req => fetch(endpoint, { method: "POST", headers, body: JSON.stringify(req) })));
        console.log("Response 1:", await res1.json());
        console.log("Response 2:", await res2.json());
        console.log("Response 3:", await res3.json());
    } catch (error) {
        console.error("Error:", error);
    }
}

fetchChatCompletions();

Environment

Python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:27:36) [GCC 13.3.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 4090
GPU 0 Compute Capability: 8.9
CUDA_HOME: None
PyTorch: 2.5.1+cu124
sglang: 0.4.0.post1
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.47.0
torchao: 0.6.1
numpy: 1.26.4
aiohttp: 3.10.11
fastapi: 0.115.6
hf_transfer: 0.1.8
huggingface_hub: 0.26.5
interegular: 0.3.3
modelscope: 1.21.0
orjson: 3.10.12
packaging: 24.1
psutil: 6.1.0
pydantic: 2.9.2
multipart: 0.0.19
zmq: 26.2.0
uvicorn: 0.32.1
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.51.2
anthropic: 0.40.0
decord: 0.6.0
NVIDIA Topology: 
        GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-23    0               N/A

xgrammar==0.1.6

The text was updated successfully, but these errors were encountered:

Ubospica · 2024-12-09T07:41:13Z

@remixer-dec Thanks for reporting the bug! See also #2216 (comment). We will fix that soon.

alanxmay · 2024-12-19T07:59:30Z

@Ubospica Hi, sorry to bother you, but is there any progress? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] XGrammar causes gibberish during parallel execution and cuts off other requests #2414

[Bug] XGrammar causes gibberish during parallel execution and cuts off other requests #2414

remixer-dec commented Dec 9, 2024

Ubospica commented Dec 9, 2024

alanxmay commented Dec 19, 2024

[Bug] XGrammar causes gibberish during parallel execution and cuts off other requests #2414

[Bug] XGrammar causes gibberish during parallel execution and cuts off other requests #2414

Comments

remixer-dec commented Dec 9, 2024

Checklist

Describe the bug

Reproduction

Environment

Ubospica commented Dec 9, 2024

alanxmay commented Dec 19, 2024