GitHub - yukavio/sglang: SGLang is yet another fast serving framework for large language models and vision language models.

News

[2024/12] 🔥 SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs (blog).
[2024/10] 🔥 The First SGLang Online Meetup (slides).
[2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (blog).
[2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) (blog).

More

[2024/02] SGLang enables 3x faster JSON decoding with compressed finite state machine (blog).
[2024/04] SGLang is used by the official LLaVA-NeXT (video) release (blog).
[2024/01] SGLang provides up to 5x faster inference with RadixAttention (blog).
[2024/01] SGLang powers the serving of the official LLaVA v1.6 release demo (usage).

About

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language. The core features include:

Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, jump-forward constrained decoding, overhead-free CPU scheduler, continuous batching, token attention (paged attention), tensor parallelism, FlashInfer kernels, chunked prefill, and quantization (FP8/INT4/AWQ/GPTQ).
Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
Active Community: SGLang is open-source and backed by an active community with industry adoption.

Getting Started

Benchmark And Performance

Learn more in our release blogs: v0.2 blog, v0.3 blog, v0.4 blog

Roadmap

Development Roadmap (2024 Q4)

Adoption and Sponsorship

The project is supported by (alphabetically): AMD, Baseten, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, Meituan, NVIDIA, RunPod, Stanford, UC Berkeley, xAI and 01.AI.

Acknowledgment and Citation

We learned from the design and reused code from the following projects: Guidance, vLLM, LightLLM, FlashInfer, Outlines, and LMQL. Please cite our paper, SGLang: Efficient Execution of Structured Language Model Programs, if you find the project useful.

Name	Name	Last commit message	Last commit date
Latest commit zhyncs fix followup sgl-project#2517 (sgl-project#2524 ) Dec 19, 2024 d95a5f5 · Dec 19, 2024 History 1,523 Commits
.github	.github	Clean up GPU memory after killing sglang processes (sgl-project#2457 )	Dec 17, 2024
3rdparty/amd	3rdparty/amd	Add get weights by parameter name for llama (sgl-project#2266 )	Nov 30, 2024
assets	assets	Add OpenAI backend to the CI test (sgl-project#869 )	Aug 1, 2024
benchmark	benchmark	benchmark decoding attention kernel with cudnn (sgl-project#2467 )	Dec 17, 2024
docker	docker	Release v0.4.0.post1 (sgl-project#2375 )	Dec 6, 2024
docs	docs	Print progress bar during cuda graph capture (sgl-project#2502 )	Dec 17, 2024
examples	examples	Add more support for intel Gaudi accelerators (sgl-project#2357 )	Dec 6, 2024
python	python	fix: continue to use flashinfer 0.1.6 temporarily (sgl-project#2517 )	Dec 19, 2024
scripts	scripts	fix followup sgl-project#2517 (sgl-project#2524 )	Dec 19, 2024
sgl-kernel	sgl-kernel	fix typo (sgl-project#2487 )	Dec 15, 2024
sgl-router	sgl-router	Rename rust folder to sgl-router (sgl-project#2464 )	Dec 12, 2024
test	test	Temporarily disable unit test of torch native attention backend (sgl-…	Dec 16, 2024
.editorconfig	.editorconfig	minor: Add basic editorconfig and pre-commit hooks to enforce style f…	Nov 6, 2024
.gitignore	.gitignore	misc: update build setup (sgl-project#2306 )	Dec 1, 2024
.gitmodules	.gitmodules	[Submodule] Change FlashInfer to import (sgl-project#156 )	Feb 7, 2024
.isort.cfg	.isort.cfg	minor: Add basic editorconfig and pre-commit hooks to enforce style f…	Nov 6, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	feat(pre-commit): trim unnecessary notebook metadata from git history (…	Nov 22, 2024
LICENSE	LICENSE	docs: fix module docstrings and copyright headers (sgl-project#2077 )	Nov 22, 2024
Makefile	Makefile	chore: bump v0.3.6.post3 (sgl-project#2259 )	Nov 29, 2024
README.md	README.md	Update readme (sgl-project#2500 )	Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

About

Getting Started

Benchmark And Performance

Roadmap

Adoption and Sponsorship

Acknowledgment and Citation

About

Releases

Packages

Languages

License

yukavio/sglang

Folders and files

Latest commit

History

Repository files navigation

News

About

Getting Started

Benchmark And Performance

Roadmap

Adoption and Sponsorship

Acknowledgment and Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages