Skip to content

πŸ‘¨β€πŸ’» An awesome and curated list of best code-LLM for research.

License

Notifications You must be signed in to change notification settings

huybery/Awesome-Code-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‘¨β€πŸ’» Awesome Code LLM

Awesome PRs Welcome Last Commit

Β 

πŸ”† How to Contribute

Contributions are welcome! If you have any resources, tools, papers, or insights related to Code LLMs, feel free to submit a pull request. Let's work together to make this project better!

Β 

News

Β 

🧡 Table of Contents

Β 

πŸš€ Top Code LLMs

Sort by HumanEval Pass@1
Rank Model Params HumanEval MBPP Source
1 o1-mini-2024-09-12 - 97.6 93.9 paper
2 o1-preview-2024-09-12 - 95.1 93.4 paper
3 Qwen2.5-Coder-32B-Instruct 32B 92.7 90.2 github
4 Claude-3.5-Sonnet-20241022 - 92.1 91.0 paper
5 GPT-4o-2024-08-06 - 92.1 86.8 paper
6 Qwen2.5-Coder-14B-Instruct 14B 89.6 86.2 github
7 Claude-3.5-Sonnet-20240620 - 89.0 87.6 paper
8 GPT-4o-mini-2024-07-18 - 87.8 86.0 paper
9 Qwen2.5-Coder-7B-Instruct 7B 88.4 83.5 github
10 DS-Coder-V2-Instruct 21/236B 85.4 89.4 github
11 Qwen2.5-Coder-3B-Instruct 3B 84.1 73.6 github
12 DS-Coder-V2-Lite-Instruct 2.4/16B 81.1 82.8 github
13 CodeQwen1.5-7B-Chat 7B 83.5 70.6 github
14 DeepSeek-Coder-33B-Instruct 33B 79.3 70.0 github
15 DeepSeek-Coder-6.7B-Instruct 6.7B 78.6 65.4 github
16 GPT-3.5-Turbo - 76.2 70.8 github
17 CodeLlama-70B-Instruct 70B 72.0 77.8 paper
18 Qwen2.5-Coder-1.5B-Instruct 1.5B 70.7 69.2 github
19 StarCoder2-15B-Instruct-v0.1 15B 67.7 78.0 paper
20 Qwen2.5-Coder-0.5B-Instruct 0.5B 61.6 52.4 github
21 Pangu-Coder2 15B 61.6 - paper
22 WizardCoder-15B 15B 57.3 51.8 paper
23 CodeQwen1.5-7B 7B 51.8 61.8 github
24 CodeLlama-34B-Instruct 34B 48.2 61.1 paper
25 Code-Davinci-002 - 47.0 - paper

Β 

πŸ’‘ Evaluation Toolkit:

  • bigcode-evaluation-harness: A framework for the evaluation of autoregressive code generation language models.
  • code-eval: A framework for the evaluation of autoregressive code generation language models on HumanEval.

Β 

πŸš€ Awesome Code LLMs Leaderboard

Leaderboard Description
Evalperf Leaderboard Evaluating LLMs for Efficient Code Generation.
Aider Code Editing Leaderboard Measuring the LLM’s coding ability, and whether it can write new code that integrates into existing code.
BigCodeBench Leaderboard BigCodeBench evaluates LLMs with practical and challenging programming tasks.
LiveCodeBench Leaderboard Holistic and Contamination Free Evaluation of Large Language Models for Code.
Big Code Models Leaderboard Compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E.
BIRD Leaderboard BIRD contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.
CanAiCode Leaderboard CanAiCode Leaderboard
Coding LLMs Leaderboard Coding LLMs Leaderboard
CRUXEval Leaderboard CRUXEval is a benchmark complementary to HumanEval and MBPP measuring code reasoning, understanding, and execution capabilities!
EvalPlus Leaderboard EvalPlus evaluates AI Coders with rigorous tests.
InfiBench Leaderboard InfiBench is a comprehensive benchmark for code large language models evaluating model ability on answering freeform real-world questions in the code domain.
InterCode Leaderboard InterCode is a benchmark for evaluating language models on the interactive coding task. Given a natural language request, an agent is asked to interact with a software system (e.g., database, terminal) with code to resolve the issue.
Program Synthesis Models Leaderboard They created this leaderboard to help researchers easily identify the best open-source model with an intuitive leadership quadrant graph. They evaluate the performance of open-source code models to rank them based on their capabilities and market adoption.
Spider Leaderboard Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

Β 

πŸ“š Awesome Code LLMs Papers

🌊 Awesome Code Pre-Training Papers

Title Venue Date Code Resources
Star
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Preprint 2024.11 Github HF
Star
Qwen2.5-Coder Technical Report
Preprint 2024.09 Github HF
Star
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Preprint 2024.06 Github HF
Star
StarCoder 2 and The Stack v2: The Next Generation
Preprint 2024.02 Github HF
Star
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Preprint 2024.01 Github HF
Star
Code Llama: Open Foundation Models for Code
Preprint 2023.08 Github HF
Textbooks Are All You Need
Preprint 2023.06 - HF
Star
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Preprint 2023.05 Github HF
Star
StarCoder: may the source be with you!
Preprint 2023.05 Github HF
Star
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
ICLR23 2023.05 Github HF
Star
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Preprint 2023.03 Github HF
SantaCoder: don't reach for the stars!
Preprint 2023.01 - HF
Star
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
ICLR'23 2022.03 Github HF
Star
Evaluating Large Language Models Trained on Code
Preprint 2021.07 Github -

Β 

🐳 Awesome Code Instruction-Tuning Papers

Title Venue Date Code Resources
Star
Magicoder: Source Code Is All You Need
ICML'24 2023.12 Github HF
Star
OctoPack: Instruction Tuning Code Large Language Models
ICLR'24 2023.08 Github HF
Star
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Preprint 2023.07 Github HF
Star
Code Alpaca: An Instruction-following LLaMA Model trained on code generation instructions
Preprint 2023.xx Github HF

Β 

🐬 Awesome Code Alignment Papers

Title Venue Date Code Resources
PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models
Preprint 2024.06 - -
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
Preprint 2023.07 - -
Star
RLTF: Reinforcement Learning from Unit Test Feedback
Preprint 2023.07 Github -
Star
Execution-based Code Generation using Deep Reinforcement Learning
TMLR'23 2023.01 Github -
Star
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
NeurIPS'22 2022.07 Github -

Β 

πŸ‹ Awesome Code Prompting Papers

Title Venue Date Code Resources
Star
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Preprint 2024.10 Github -
Star
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
ACL'24 2024.02 Github -
SelfEvolve: A Code Evolution Framework via Large Language Models
Preprint 2023.06 - -
Star
Demystifying GPT Self-Repair for Code Generation
ICLR'24 2023.06 Github -
Teaching Large Language Models to Self-Debug
ICLR'24 2023.06 - -
Star
LEVER: Learning to Verify Language-to-Code Generation with Execution
ICML'23 2023.02 Github -
Star
Coder Reviewer Reranking for Code Generation
ICML'23 2022.11 Github -
Star
CodeT: Code Generation with Generated Tests
ICLR'23 2022.07 Github -

Β 

πŸ™ Awesome Code Benchmark & Evaluation Papers

Dataset Title Venue Date Code Resources
GitChameleon Star
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
Preprint 2024.11 Github -
Evalperf Star
Evaluating Language Models for Efficient Code Generation
COLM'24 2024.08 Github HF
LiveCodeBench Star
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Preprint 2024.03 Github HF
DevBench Star
DevBench: A Comprehensive Benchmark for Software Development
Preprint 2024.03 Github -
SWE-bench Star
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
ICLR'24 2024.03 Github HF
CrossCodeEval Star
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
NeurIPS'23 2023.11 Github -
RepoCoder Star
Repository-Level Code Completion Through Iterative Retrieval and Generation
EMNLP'23 2023.10 Github -
LongCoder Star
LongCoder: A Long-Range Pre-trained Language Model for Code Completion
ICML'23 2023.10 Github -
- Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation
Preprint 2023.08 - -
BioCoder Star
BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models
ISMB'24 2023.08 Github -
RepoBench Star
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
ICLR'24 2023.06 Github HF
Evalplus Star
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
NeurIPS'23 2023.05 Github HF
Coeditor Star
Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing
ICLR'24 2023.05 Github -
DS-1000 Star
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
ICML'23 2022.11 Github HF
MultiPL-E Star
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation
Preprint 2022.08 Github HF
MBPP Star
Program Synthesis with Large Language Models
Preprint 2021.08 Github HF
APPS Star
Measuring Coding Challenge Competence With APPS
NeurIPS'21 2021.05 Github HF

Β 

πŸ™Œ Contributors

This is an active repository and your contributions are always welcome! If you have any question about this opinionated list, do not hesitate to contact me [email protected].

Β 

Cite as

@software{awesome-code-llm,
  author = {Binyuan Hui, Lei Zhang},
  title = {An awesome and curated list of best code-LLM for research},
  howpublished = {\url{https://github.com/huybery/Awesome-Code-LLM}},
  year = 2023,
}

Β 

Acknowledgement

This project is inspired by Awesome-LLM.

Β 

Star History

Star History Chart

⬆ Back to ToC

About

πŸ‘¨β€πŸ’» An awesome and curated list of best code-LLM for research.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published