Releases: lm-sys/FastChat
Releases · lm-sys/FastChat
Release v0.2.36
Highlights
- Added SGLang worker for vision language models, lower latency and higher throughput #2928
- Vision langauge WebUI #2960
- OpenAI-compatible API server now supports image input #2928
- Added LightLLM worker for higher throughput https://github.com/lm-sys/FastChat/blob/main/docs/lightllm_integration.md
- Added Apple MLX worker #2940
What's Changed
- fix specify local path issue use model from www.modelscope.cn by @liuyhwangyh in #2934
- support openai embedding for topic clustering by @CodingWithTim in #2729
- Remove duplicate API endpoint by @surak in #2949
- Update Hermes Mixtral by @teknium1 in #2938
- Enablement of REST API Usage within Google Colab Free Tier by @ggcr in #2940
- Create a new worker implementation for Apple MLX by @aliasaria in #2937
- feat: support Model Yuan2.0, a new generation Fundamental Large Language Model developed by IEIT System by @cauwulixuan in #2936
- Fix the pooling method of BGE embedding model by @staoxiao in #2926
- SGLang Worker by @BabyChouSr in #2928
- Update mlx_worker to be async by @aliasaria in #2958
- Integrate LightLLM into serve worker by @zeyugao in #2888
- Copy button by @surak in #2963
- feat: train with template by @congchan in #2951
- fix content maybe a str by @zhouzaida in #2968
- Adding download folder information in README by @dheeraj-326 in #2972
- use cl100k_base as the default tiktoken encoding by @bjwswang in #2974
- Update README.md by @merrymercy in #2975
- Fix tokenizer for vllm worker by @Michaelvll in #2984
- update yuan2.0 generation by @wangpengfei1013 in #2989
- fix: tokenization mismatch when training with different templates by @congchan in #2996
- fix: inconsistent tokenization by llama tokenizer by @congchan in #3006
- Fix type hint for play_a_match_single by @MonkeyLeeT in #3008
- code update by @infwinston in #2997
- Update model_support.md by @infwinston in #3016
- Update lightllm_integration.md by @eltociear in #3014
- Upgrade gradio to 4.17 by @infwinston in #3027
- Update MLX integration to use new generate_step function signature by @aliasaria in #3021
- Update readme by @merrymercy in #3028
- Update gradio version in
pyproject.toml
and fix a bug by @merrymercy in #3029 - Update gradio demo and API model providers by @merrymercy in #3030
- Gradio Web Server for Multimodal Models by @BabyChouSr in #2960
- Migrate the gradio server to openai v1 by @merrymercy in #3032
- Update version to 0.2.36 by @merrymercy in #3033
New Contributors
- @teknium1 made their first contribution in #2938
- @ggcr made their first contribution in #2940
- @aliasaria made their first contribution in #2937
- @cauwulixuan made their first contribution in #2936
- @staoxiao made their first contribution in #2926
- @zhouzaida made their first contribution in #2968
- @dheeraj-326 made their first contribution in #2972
- @bjwswang made their first contribution in #2974
- @MonkeyLeeT made their first contribution in #3008
Full Changelog: v0.2.35...v0.2.36
Release v0.2.35
What's Changed
- add dolphin by @infwinston in #2794
- Fix tiny typo by @bofenghuang in #2805
- Add instructions for evaluating on MT bench using vLLM by @iojw in #2770
- fix missing op | for py3.8 by @dumpmemory in #2800
- Add SOLAR-10.7b Instruct Model by @BabyChouSr in #2826
- Update README.md by @eltociear in #2852
- fix: 'compeletion' typo by @congchan in #2847
- Add Tunnelmole as an open source alternative to ngrok and include usage instructions by @robbie-cahill in #2846
- Add support for CatPPT by @rishiraj in #2840
- Add functionality to ping AI2 InferD endpoints for tulu 2 by @natolambert in #2832
- add download models from www.modelscope.cn by @liuyhwangyh in #2830
- Fix conv_template of chinese alpaca 2 by @zollty in #2812
- add bagel model adapter by @jondurbin in #2814
- add root_path argument to gradio web server. by @stephanbertl in #2807
- Import
accelerate
locally to avoid it as a strong dependency by @chiragjn in #2820 - Replace dict merge with unpacking for compatibility of 3.8 in vLLM worker by @rudeigerc in #2824
- Format code by @merrymercy in #2854
- Openai API migrate by @andy-yang-1 in #2765
- Add new models (Perplexity, gemini) & Separate GPT versions by @merrymercy in #2856
- Clean error messages by @merrymercy in #2857
- Update docs by @Ying1123 in #2858
- Modify doc description by @zhangsibo1129 in #2859
- Fix the problem of not using the decoding method corresponding to the base model in peft mode by @Jingsong-Yan in #2865
- update a new sota model on MT-Bench which touch an 8.8 scores. by @xiechengmude in #2864
- NPU needs to be initialized when starting a new process by @jq460494839 in #2843
- Fix the problem with "vllm + chatglm3" (#2845) by @yaofeng in #2876
- Update token spacing for mistral conversation.py by @thavens in #2872
- check if hm in models before deleting to avoid errors by @joshua-ne in #2870
- Add TinyLlama by @Gk-rohan in #2889
- Fix bug that model doesn't automatically switch peft adapter by @Jingsong-Yan in #2884
- Update web server commands by @merrymercy in #2869
- fix the tokenize process and prompt template of chatglm3 by @WHDY in #2883
- Add
Notus
support by @gabrielmbmb in #2813 - feat: support anthropic api with api_dict by @congchan in #2879
- Update model_adapter.py by @thavens in #2895
- leaderboard code update by @infwinston in #2867
- fix: change order of SEQUENCE_LENGTH_KEYS by @congchan in #2925
- fix baichuan:apply_prompt_template call args error by @Force1ess in #2921
- Fix a typo in openai_api_server.py by @jklj077 in #2905
- feat: use variables OPENAI_MODEL_LIST by @congchan in #2907
- Add TenyxChat-7B-v1 model by @sarath-shekkizhar in #2901
- add support for iei yuan2.0 (https://huggingface.co/IEITYuan) by @wangpengfei1013 in #2919
- nous-hermes-2-mixtral-dpo by @152334H in #2922
- Bump the version to 0.2.35 by @merrymercy in #2927
New Contributors
- @dumpmemory made their first contribution in #2800
- @robbie-cahill made their first contribution in #2846
- @rishiraj made their first contribution in #2840
- @natolambert made their first contribution in #2832
- @liuyhwangyh made their first contribution in #2830
- @stephanbertl made their first contribution in #2807
- @chiragjn made their first contribution in #2820
- @rudeigerc made their first contribution in #2824
- @jq460494839 made their first contribution in #2843
- @yaofeng made their first contribution in #2876
- @thavens made their first contribution in #2872
- @joshua-ne made their first contribution in #2870
- @WHDY made their first contribution in #2883
- @gabrielmbmb made their first contribution in #2813
- @jklj077 made their first contribution in #2905
- @sarath-shekkizhar made their first contribution in #2901
- @wangpengfei1013 made their first contribution in #2919
Full Changelog: v0.2.34...v0.2.35
Release v0.2.34
What's Changed
- fix tokenizer.pad_token attribute error by @wangshuai09 in #2710
- support stable-vicuna model by @hi-jin in #2696
- Exllama cache 8bit by @mjkaye in #2719
- Add Yi support by @infwinston in #2723
- Add Hermes 2.5 [fixed] by @152334H in #2725
- Fix Hermes2Adapter by @lewtun in #2727
- Fix YiAdapter by @Jingsong-Yan in #2730
- add trust_remote_code argument by @wangshuai09 in #2715
- Add revision arg to MT Bench answer generation by @lewtun in #2728
- Fix MPS backend 'index out of range' error by @suquark in #2737
- add starling support by @infwinston in #2738
- Add deepseek chat by @BabyChouSr in #2760
- a convenient script for spinning up the API with Model Workers by @ckgresla in #2790
- Prevent returning partial stop string in vllm worker by @pandada8 in #2780
- Update UI and new models by @infwinston in #2762
- Support MetaMath by @iojw in #2748
- Use common logging code in the OpenAI API server by @geekoftheweek in #2758
- Show how to turn on experiment tracking for fine-tuning by @morganmcg1 in #2742
- Support xDAN-L1-Chat Model by @xiechengmude in #2732
- Update the version to 0.2.34 by @merrymercy in #2793
New Contributors
- @mjkaye made their first contribution in #2719
- @152334H made their first contribution in #2725
- @Jingsong-Yan made their first contribution in #2730
- @ckgresla made their first contribution in #2790
- @pandada8 made their first contribution in #2780
- @iojw made their first contribution in #2748
- @geekoftheweek made their first contribution in #2758
- @morganmcg1 made their first contribution in #2742
- @xiechengmude made their first contribution in #2732
Full Changelog: v0.2.33...v0.2.34
Release v0.2.33
What's Changed
- fix: Fix for OpenOrcaAdapter to return correct conversation template by @vjsrinath in #2613
- Make fastchat.serve.model_worker to take debug argument by @uinone in #2628
- openchat 3.5 model support by @imoneoi in #2638
- xFastTransformer framework support by @a3213105 in #2615
- feat: support custom models vllm serving by @congchan in #2635
- kill only fastchat process by @scenaristeur in #2641
- Use conv.update_last_message api in mt-bench answer generation by @merrymercy in #2647
- Improve Azure OpenAI interface by @infwinston in #2651
- Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer by @CodingWithTim in #2653
- Pin openai version < 1 by @infwinston in #2658
- Remove exclude_unset parameter by @snapshotpl in #2654
- Revert "Remove exclude_unset parameter" by @merrymercy in #2666
- added support for CodeGeex(2) by @peterwilli in #2645
- add chatglm3 conv template support in conversation.py by @ZeyuTeng96 in #2622
- UI and model change by @infwinston in #2672
- train_flant5: fix typo by @Force1ess in #2673
- Fix gpt template by @infwinston in #2674
- Update README.md by @merrymercy in #2679
- feat: support template's stop_str as list by @congchan in #2678
- Update exllama_v2.md by @jm23jeffmorgan in #2680
- save model under deepspeed by @MrZhengXin in #2689
- Adding SSL support for model workers and huggingface worker by @lnguyen in #2687
- Check the max_new_tokens <= 0 in openai api server by @zeyugao in #2688
- Add Microsoft/Orca-2-7b and update model support docs by @BabyChouSr in #2714
- fix tokenizer of chatglm2 by @wangshuai09 in #2711
- Template for using Deepseek code models by @AmaleshV in #2705
- add support for Chinese-LLaMA-Alpaca by @zollty in #2700
- Make --load-8bit flag work with weights in safetensors format by @xuguodong1999 in #2698
- Format code and minor bug fix by @merrymercy in #2716
- Bump version to v0.2.33 by @merrymercy in #2717
New Contributors
- @vjsrinath made their first contribution in #2613
- @uinone made their first contribution in #2628
- @a3213105 made their first contribution in #2615
- @scenaristeur made their first contribution in #2641
- @snapshotpl made their first contribution in #2654
- @peterwilli made their first contribution in #2645
- @ZeyuTeng96 made their first contribution in #2622
- @Force1ess made their first contribution in #2673
- @jm23jeffmorgan made their first contribution in #2680
- @MrZhengXin made their first contribution in #2689
- @lnguyen made their first contribution in #2687
- @wangshuai09 made their first contribution in #2711
- @AmaleshV made their first contribution in #2705
- @zollty made their first contribution in #2700
- @xuguodong1999 made their first contribution in #2698
Full Changelog: v0.2.32...v0.2.33
Release v0.2.32
What's Changed
- Fix for single turn dataset by @toslunar in #2509
- replace os.getenv with os.path.expanduser because the first one doesn… by @khalil-Hennara in #2515
- Fix arena by @merrymercy in #2522
- Update Dockerfile by @dubaoquan404 in #2524
- add Llama2ChangAdapter by @lcw99 in #2510
- Add ExllamaV2 Inference Framework Support. by @leonxia1018 in #2455
- Improve docs by @merrymercy in #2534
- Fix warnings for new gradio versions by @merrymercy in #2538
- Improve chat templates by @merrymercy in #2539
- Add Zephyr 7B Alpha by @lewtun in #2535
- Improve Support for Mistral-Instruct by @Steve-Tech in #2547
- correct max_tokens by context_length instead of raise exception by @liunux4odoo in #2544
- Revert "Improve Support for Mistral-Instruct" by @merrymercy in #2552
- Fix Mistral template by @normster in #2529
- Add additional Informations from the vllm worker by @SebastianBodza in #2550
- Make FastChat work with LMSYS-Chat-1M Code by @CodingWithTim in #2551
- Create
tags
attribute to fixMarkupError
in rich CLI by @Steve-Tech in #2553 - move BaseModelWorker outside serve.model_worker to make it independent by @liunux4odoo in #2531
- Misc style and bug fixes by @merrymercy in #2559
- Fix README.md by @infwinston in #2561
- release v0.2.31 by @merrymercy in #2563
- resolves #2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 by @alexdelapaz in #2565
- Add airoboros_v3 chat template (llama-2 format) by @jondurbin in #2564
- Add Xwin-LM V0.1, V0.2 support by @REIGN12 in #2566
- Fixed model_worker generate_gate may blocked main thread (#2540) by @lvxuan263 in #2562
- feat: add claude-v2 by @congchan in #2571
- Update vigogne template by @bofenghuang in #2580
- Fix issue #2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. by @Phil-U-U in #2579
- Add Mistral-7B-OpenOrca conversation_temmplate by @waynespa in #2585
- docs: bit misspell comments model adapter default template name conversation by @guspan-tanadi in #2594
- Update Mistral template by @Gk-rohan in #2581
- Update README.md (vicuna-v1.3 -> vicuna-1.5) by @infwinston in #2592
- Update README.md to highlight chatbot arena by @infwinston in #2596
- Add Lemur model by @ugolotti in #2584
- add trust_remote_code=True in BaseModelAdapter by @edisonwd in #2583
- Openai interface add use beam search and best of 2 by @leiwen83 in #2442
- Update qwen and add pygmalion by @Trangle in #2607
- feat: Support model AquilaChat2 by @fangyinc in #2616
- Added settings vllm by @SebastianBodza in #2599
- [Logprobs] Support logprobs=1 by @comaniac in #2612
New Contributors
- @toslunar made their first contribution in #2509
- @khalil-Hennara made their first contribution in #2515
- @dubaoquan404 made their first contribution in #2524
- @leonxia1018 made their first contribution in #2455
- @lewtun made their first contribution in #2535
- @normster made their first contribution in #2529
- @SebastianBodza made their first contribution in #2550
- @alexdelapaz made their first contribution in #2565
- @REIGN12 made their first contribution in #2566
- @lvxuan263 made their first contribution in #2562
- @Phil-U-U made their first contribution in #2579
- @waynespa made their first contribution in #2585
- @guspan-tanadi made their first contribution in #2594
- @Gk-rohan made their first contribution in #2581
- @ugolotti made their first contribution in #2584
- @edisonwd made their first contribution in #2583
- @fangyinc made their first contribution in #2616
- @comaniac made their first contribution in #2612
Full Changelog: v0.2.30...v0.2.32
Release v0.2.30
What's Changed
- Support new models
- Bug fixes
New Contributors
- @wangxiyuan made their first contribution in #2404
- @wangzhen263 made their first contribution in #2402
- @karshPrime made their first contribution in #2406
- @obitolyz made their first contribution in #2408
- @Somezak1 made their first contribution in #2431
- @hi-jin made their first contribution in #2434
- @zhangsibo1129 made their first contribution in #2422
- @tobiabir made their first contribution in #2418
- @Btlmd made their first contribution in #2384
- @brandonbiggs made their first contribution in #2448
- @dongxiaolong made their first contribution in #2463
- @shuishu made their first contribution in #2482
- @asaiacai made their first contribution in #2469
- @hnyls2002 made their first contribution in #2456
- @enochlev made their first contribution in #2499
- @AlpinDale made their first contribution in #2500
- @lerela made their first contribution in #2483
Full Changelog: v0.2.28...v0.2.30
Release v0.2.28
What's Changed
- Multiple UI updates, performance improvements and bug fixes
- New model support (Spicyboros + airoboros 2.2, VMware's OpenLLaMa OpenInstruct)
- Add sponsors (Kaggle, MBZUAI, AnyScale, and Huggingface)
New Contributors
- @nicobasile made their first contribution in #2278
- @zeyugao made their first contribution in #2297
- @fan-chao made their first contribution in #2290
- @leiwen83 made their first contribution in #2273
- @siddartha-RE made their first contribution in #2263
- @renatz made their first contribution in #2296
- @so2liu made their first contribution in #2225
- @epec254 made their first contribution in #2306
- @woshiyyya made their first contribution in #2326
- @vaxilicaihouxian made their first contribution in #2328
- @nathanstitt made their first contribution in #2337
Full Changelog: v0.2.25...v0.2.28
v0.2.25
What's Changed
- Support new models (Qwen, WizardCoder, Llama2-Chinese, BAAI/AquilaChat-7B, OpenOrca, BGE)
- Improve performance and usability. Fix bugs.
- Reduce dependency by making some required packages optional
New Contributors
- @azshue made their first contribution in #2169
- @liunux4odoo made their first contribution in #2147
- @tmm1 made their first contribution in #2126
- @shibing624 made their first contribution in #2138
- @Tomorrowxxy made their first contribution in #2185
- @Extremys made their first contribution in #2166
- @gesanqiu made their first contribution in #2192
- @alongLFB made their first contribution in #2202
- @congchan made their first contribution in #2194
- @Rayrtfr made their first contribution in #2218
- @bofenghuang made their first contribution in #2236
- @Cyrilvallez made their first contribution in #2239
- @persistz made their first contribution in #2248
- @LeiZhou-97 made their first contribution in #2247
Full Changelog: v0.2.23...v0.2.25
Release v0.2.22
- Released Vicuna v1.5 based on Llama 2 with 4K and 16K context lengths. Download weights
- Released Chatbot Arena Conversations, a dataset containing 33k conversations with human preferences. Download it here.
- Serving
- Add a multi-model worker that can host multiple models on a single GPU and share base weights for PEFT models. #1866 #1905
- AWQ 4-bit quantization support. #2103
- Support model models (Llama 2, Claude 2, ChatGLM 2, StarChat, Baichuan-13B, InternLM, airoboros, PEFT adapters).
- Better support for AMD GPUs, Intel XPUs. #1954 #2052
- Training
Release v0.2.18
- Release MT-bench code and data
- Release new models
- Support more models (Falcon, Salesforce/xgen, Salesforce/codet5p-6b, Robin-7B/13B/33B, Baichuan-7B)
- Integrate vLLM worker for continuous batching and high-throughput serving. See doc.