Skip to content
View HandH1998's full-sized avatar
  • Beijing
  • 11:50 (UTC +08:00)

Block or report HandH1998

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. vllm-project/vllm vllm-project/vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 30.9k 4.7k

  2. bytedance/lightseq bytedance/lightseq Public

    LightSeq: A High Performance Library for Sequence Processing and Generation

    C++ 3.2k 329

  3. microsoft/Megatron-DeepSpeed microsoft/Megatron-DeepSpeed Public

    Forked from NVIDIA/Megatron-LM

    Ongoing research training transformer language models at scale, including: BERT & GPT-2

    Python 1.9k 346

  4. AniZpZ/AutoSmoothQuant AniZpZ/AutoSmoothQuant Public

    An easy-to-use package for implementing SmoothQuant for LLMs

    Python 84 7

  5. QQQ QQQ Public

    QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

    Python 91 8

  6. IST-DASLab/marlin IST-DASLab/marlin Public

    FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

    Python 635 50