This repository is used to collect papers and code in the field of AI. The contents contain the following parts:
├─ NLP/
│ ├─ Word2Vec/
│ ├─ Seq2Seq/
│ └─ Pretraining/
│ ├─ Large Language Model/
│ ├─ LLM Application/
│ ├─ AI Agent/
│ ├─ Academic/
│ ├─ Code/
│ ├─ Financial Application/
│ ├─ Information Retrieval/
│ ├─ Math/
│ ├─ Medicine and Law/
│ ├─ Recommend System/
│ └─ Tool Learning/
│ ├─ LLM Technique/
│ ├─ Alignment/
│ ├─ Context Length/
│ ├─ Corpus/
│ ├─ Evaluation/
│ ├─ Hallucination/
│ ├─ Inference/
│ ├─ MoE/
│ ├─ PEFT/
│ ├─ Prompt Learning/
│ ├─ RAG/
│ └─ Reasoning and Planning/
│ ├─ LLM Theory/
│ └─ Chinese Model/
├─ CV/
│ ├─ CV Application/
│ ├─ Contrastive Learning/
│ ├─ Foundation Model/
│ ├─ Generative Model (GAN and VAE)/
│ ├─ Image Editing/
│ ├─ Object Detection/
│ ├─ Semantic Segmentation/
│ └─ Video/
├─ Multimodal/
│ ├─ Audio/
│ ├─ BLIP/
│ ├─ CLIP/
│ ├─ Diffusion Model/
│ ├─ Multimodal LLM/
│ ├─ Text2Image/
│ ├─ Text2Video/
│ └─ Survey/
│─ Reinforcement Learning/
│─ GNN/
└─ Transformer Architecture/
- Efficient Estimation of Word Representations in Vector Space, Mikolov et al., arxiv 2013. [paper]
- Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al., NIPS 2013. [paper]
- Distributed representations of sentences and documents, Le and Mikolov, ICML 2014. [paper]
- Word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method, Goldberg and Levy, arxiv 2014. [paper]
- word2vec Parameter Learning Explained, Rong, arxiv 2014. [paper]
- Glove: Global vectors for word representation.,Pennington et al., EMNLP 2014. [paper][code]
- fastText: Bag of Tricks for Efficient Text Classification, Joulin et al., arxiv 2016. [paper][code]
- ELMo: Deep Contextualized Word Representations, Peters et al., NAACL 2018. [paper]
- BPE: Neural Machine Translation of Rare Words with Subword Units, Sennrich et al., ACL 2016. [paper][code]
- Byte-Level BPE: Neural Machine Translation with Byte-Level Subwords, Wang et al., arxiv 2019. [paper][code]
- Generating Sequences With Recurrent Neural Networks, Graves, arxiv 2013. [paper]
- Sequence to Sequence Learning with Neural Networks, Sutskever et al., NeruIPS 2014. [paper]
- Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al., ICLR 2015. [paper][code]
- On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, Cho et al., arxiv 2014. [paper]
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Cho et al., arxiv 2014. [paper]
- [fairseq][fairseq2][pytorch-seq2seq]
-
Attention Is All You Need, Vaswani et al., NIPS 2017. [paper][code]
-
GPT: Improving language understanding by generative pre-training, Radford et al., preprint 2018. [paper][code]
-
GPT-2: Language Models are Unsupervised Multitask Learners, Radford et al., OpenAI blog 2019. [paper][code][llm.c]
-
GPT-3: Language Models are Few-Shot Learners, Brown et al., NeurIPS 2020. [paper][code][nanoGPT][build-nanogpt][gpt-fast][modded-nanogpt][nanotron]
-
InstructGPT: Training language models to follow instructions with human feedback, Ouyang et al., NeurIPS 2022. [paper][MOSS-RLHF]
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL 2019 Best Paper. [paper][code][BERT-pytorch][bert4torch][bert4keras]
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach, Liu et al., arxiv 2019. [paper][code][Chinese-BERT-wwm]
-
What Does BERT Look At: An Analysis of BERT's Attention, Clark et al., arxiv 2019. [paper][code]
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention, He et al., ICLR 2021. [paper][code]
-
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Sanh et al., arxiv 2019. [paper][code][albert_pytorch]
-
BERT Rediscovers the Classical NLP Pipeline, Tenney et al., arxiv 2019. [paper][code]
-
How to Fine-Tune BERT for Text Classification?, Sun et al., arxiv 2019. [paper][code]
-
TinyStories: How Small Can Language Models Be and Still Speak Coherent English, Eldan and Li, arxiv 2023. [paper][dataset][phi-3][SmolLM][Computational Bottlenecks of Training Small-scale Large Language Models][SLMs-Survey]
-
[LLM101n][EurekaLabsAI][llm-course][intro-llm][llm-cookbook][hugging-llm][generative-ai-for-beginners][awesome-generative-ai-guide][LLMs-from-scratch][llm-action][llms_idx][tiny-universe][AISystem]
-
[cs230-code-examples][victoresque/pytorch-template][songquanpeng/pytorch-template][Academic-project-page-template][WritingAIPaper]
-
[tokenizer_summary][minbpe][tokenizers][tiktoken][SentencePiece][Cosmos-Tokenizer]
- A Survey of Large Language Models, Zhao etal., arxiv 2023. [paper][code][LLMBox][LLMBook-zh][LLMsPracticalGuide]
- Efficient Large Language Models: A Survey, Wan et al., arxiv 2023. [paper][code]
- Challenges and Applications of Large Language Models, Kaddour et al., arxiv 2023. [paper]
- A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT, Zhou et al., arxiv 2023. [paper]
- From Google Gemini to OpenAI Q (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape*, Mclntosh et al., arxiv 2023. [paper][AGI-survey]
- A Survey of Resource-efficient LLM and Multimodal Foundation Models, Xu et al., arxiv 2024. [paper][code]
- Large Language Models: A Survey, Minaee et al., arxiv 2024. [paper]
- Anthropic: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]
- Anthropic: Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]
- Anthropic: Model Card and Evaluations for Claude Models, Anthropic, 2023. [paper]
- Anthropic: The Claude 3 Model Family: Opus, Sonnet, Haiku, Anthropic, 2024. [paper][Claude 3.5]
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model, BigScience Workshop, arxiv 2022. [paper][code][model]
- OPT: Open Pre-trained Transformer Language Models, Zhang et al., arxiv 2022. [paper][code]
- Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., arxiv 2022. [paper]
- Gopher: Scaling Language Models: Methods, Analysis & Insights from Training Gopher, Rae et al., arxiv 2021. [paper]
- GPT-NeoX-20B: An Open-Source Autoregressive Language Model, Black et al., arxiv 2022. [paper][code]
- Gemini: A Family of Highly Capable Multimodal Models, Gemini Team, Google, arxiv 2023. [paper][Gemini 1.0][Gemini 1.5][Unofficial Implementation][MiniGemini]
- Gemma: Open Models Based on Gemini Research and Technology, Google DeepMind, 2024. [paper][code][google-deepmind/gemma][gemma.cpp][model][paligemma][gemma-cookbook]
- Gemma 2: Improving Open Language Models at a Practical Size, Google Team, 2024. [paper][blog][Advancing Responsible AI with Gemma][Gemma Scope][ShieldGemma][Gemma-2-9B-Chinese-Chat]
- GPT-4 Technical Report, OpenAI, arxiv 2023. [blog][paper]
- GPT-4V(ision) System Card, OpenAI, OpenAI blog 2023. [paper][GPT-4o][GPT-4o System Card]
- Sparks of Artificial General Intelligence_Early experiments with GPT-4, Bubeck et al., arxiv 2023. [paper]
- The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision), Yang et al., arxiv 2023. [paper][guidance]
- LaMDA: Language Models for Dialog Applications, Thoppilan et al., arxiv 2022. [paper][LaMDA-rlhf-pytorch]
- LLaMA: Open and Efficient Foundation Language Models, Touvron et al., arxiv 2023. [paper][code][llama.cpp][ollama][llamafile]
- Llama 2: Open Foundation and Fine-Tuned Chat Models, Touvron et al., arxiv 2023. [paper][code][llama2.c][lit-llama][litgpt]
- The Llama 3 Herd of Models, Llama Team, AI @ Meta, 2024. [blog][paper][code][llama-models][llama-recipes][LLM Adaptation][llama3-from-scratch][nano-llama31][minimind][felafax]
- Llama 3.2: Revolutionizing edge AI and vision with open, customizable models, 2024. [blog][model][llama-stack][llama-stack-apps][lingua][llama-assistant][minimind-v]
- TinyLlama: An Open-Source Small Language Model, Zhang et al., arxiv 2024. [paper][code][LiteLlama][MobiLlama][Steel-LLM]
- Stanford Alpaca: An Instruction-following LLaMA Model, Taori et al., Stanford blog 2023. [paper][code][Alpaca-Lora][OpenAlpaca]
- Mistral 7B, Jiang et al., arxiv 2023. [paper][code][model][mistral-finetune]
- OLMo: Accelerating the Science of Language Models, Groeneveld et al., ACL 2024. [paper][code][olmo2][Dolma Dataset][Molmo and PixMo][Pangea]
- TÜLU 3: Pushing Frontiers in Open Language Model Post-Training, Lambert et al., arxiv 2024. [paper][code]
- Minerva: Solving Quantitative Reasoning Problems with Language Models, Lewkowycz et al., arxiv 2022. [paper]
- PaLM: Scaling Language Modeling with Pathways, Chowdhery et al., arxiv 2022. [paper][PaLM-pytorch][PaLM-rlhf-pytorch][PaLM]
- PaLM 2 Technical Report, Anil et al., arxiv 2023. [paper]
- PaLM-E: An Embodied Multimodal Language Model, Driess et al., arxiv 2023. [paper][code]
- T5: Exploring the limits of transfer learning with a unified text-to-text transformer, Raffel et al., Journal of Machine Learning Research 2020. [paper][code][t5-pytorch][t5-pegasus-pytorch]
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, Lewis et al., ACL 2020. [paper][code]
- FLAN: Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper][code]
- Scaling Flan: Scaling Instruction-Finetuned Language Models, Chung et al., arxiv 2022. [paper][model]
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Dai et al., ACL 2019. [paper][code]
- XLNet: Generalized Autoregressive Pretraining for Language Understanding, Yang et al., NeurIPS 2019. [paper][code]
- WebGPT: Browser-assisted question-answering with human feedback, Nakano et al., arxiv 2021. [paper][MS-MARCO-Web-Search]
- Open Release of Grok-1, xAI, 2024. [blog][code][model][modelscope][hpcai-tech/grok-1][dbrx][Command R+][snowflake-arctic]
-
A Watermark for Large Language Models, Kirchenbauer et al., arxiv 2023. [paper][code][MarkLLM][Awesome-LLM-Watermark]
-
SynthID-Text: Scalable watermarking for identifying large language model outputs, Dathathri et al., Nature 2024. [paper][code][watermark-anything]
-
SeqXGPT: Sentence-Level AI-Generated Text Detection, Wang et al., EMNLP 2023. [paper][code][llm-detect-ai][detect-gpt][fast-detect-gpt]
-
AlpaGasus: Training A Better Alpaca with Fewer Data, Chen et al., ICLR 2024. [paper][code]
-
AutoMix: Automatically Mixing Language Models, Madaan et al., arxiv 2023. [paper][code]
-
ChipNeMo: Domain-Adapted LLMs for Chip Design, Liu et al., arxiv 2023. [paper][semikong][circuit_training]
-
GAIA: A Benchmark for General AI Assistants, Mialon et al., ICLR 2024. [paper][code]
-
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al., NeurIPS 2023. [paper][code]
-
MemGPT: Towards LLMs as Operating Systems, Packer et al., arxiv 2023. [paper][code]
-
UFO: A UI-Focused Agent for Windows OS Interaction, Zhang et al., arxiv 2024. [paper][code][OSWorld][Large Action Models]
-
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement, Wu et al., ICLR 2024. [paper][code][OS-Atlas][WindowsAgentArena]
-
AIOS: LLM Agent Operating System, Mei et al., arxiv 2024. [paper][code]
-
DB-GPT: Empowering Database Interactions with Private Large Language Models, Xue et al., arxiv 2023. [paper][code][DocsGPT][privateGPT][localGPT]
-
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data, Wang et al., ICLR 2024. [paper][code]
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement, Zheng et al., arxiv 2024. [paper][code][code-interpreter][open-interpreter]
-
Orca: Progressive Learning from Complex Explanation Traces of GPT-4, Mukherjee et al., arxiv 2023. [paper]
-
PDFTriage: Question Answering over Long, Structured Documents, Saad-Falcon et al., arxiv 2023. [paper][[code]]
-
Prompt2Model: Generating Deployable Models from Natural Language Instructions, Viswanathan et al., arxiv 2023. [paper][code]
-
Shepherd: A Critic for Language Model Generation, Wang et al., arxiv 2023. [paper][code]
-
Alpaca: A Strong, Replicable Instruction-Following Model, Taori et al., Stanford Blog 2023. [paper][code]
-
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality*, Chiang et al., 2023. [blog]
-
WizardLM: Empowering Large Language Models to Follow Complex Instructions, Xu et al., ICLR 2024. [paper][code]
-
WebCPM: Interactive Web Search for Chinese Long-form Question Answering, Qin et al., ACL 2023. [paper][code]
-
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences, Liu et al., KDD 2023. [paper][code][AutoWebGLM][WebRL][AutoCrawler][gpt-crawler][webllama][gpt-researcher][skyvern][Scrapegraph-ai][crawl4ai][crawlee-python][Agent-E][CyberScraper-2077][browser-use]
-
LLM4Decompile: Decompiling Binary Code with Large Language Models, Tan et al., arxiv 2024. [paper] [code]
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases, Liu et al., ICML 2024. [paper][code][Awesome-LLMs-on-device]
-
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices, Chu et al., arxiv 2023. [paper][code][MobileVLM V2][BlueLM-V-3B]
-
The Oscars of AI Theater: A Survey on Role-Playing with Language Models, Chen et al., arxiv 2024. [paper][code][RPBench-Auto][Hermes 3 Technical Report]
-
Apple Intelligence Foundation Language Models, Gunter et al., arxiv 2024. [blog][paper]
-
Controllable Text Generation for Large Language Models: A Survey, Liang et al., arxiv 2024. [paper][code][guidance][outlines][instructor]
-
[ray][academy][dask][TaskingAI][gpt4all][ollama][llama.cpp][dify][mindsdb][bisheng][phidata][guidance][outlines][jsonformer][fabric][mem0][taipy]
-
[chatgpt-on-wechat][LLM-As-Chatbot][HuixiangDou][Streamer-Sales][Tianji][metahuman-stream][aiavatarkit][ai-getting-started][chatnio][VideoChat]
-
LLM Powered Autonomous Agents, Lilian Weng, 2023. [blog][LLMAgentPapers][LLM-Agents-Papers][awesome-language-agents][Awesome-Papers-Autonomous-Agent]
-
A Survey on Large Language Model based Autonomous Agents, Wang et al., [paper][code][LLM-Agent-Paper-Digest]
-
The Rise and Potential of Large Language Model Based Agents: A Survey, Xi et al., arxiv 2023. [paper][code]
-
Agent AI: Surveying the Horizons of Multimodal Interaction, Durante et al., arxiv 2024. [paper]
-
Position Paper: Agent AI Towards a Holistic Intelligence, Huang et al., arxiv 2024. [paper]
-
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis, Sun et al., arxiv 2024. [paper][code][homepage]
-
AgentBench: Evaluating LLMs as Agents, Liu et al., ICLR 2024. [paper][code][VisualAgentBench][OSWorld][AgentGym][Agent-S][Agent-as-a-Judge]
-
Agents: An Open-source Framework for Autonomous Language Agents, Zhou et al., arxiv 2023. [paper][code]
-
AutoAgents: A Framework for Automatic Agent Generation, Chen et al., arxiv 2023. [paper][code]
-
AgentTuning: Enabling Generalized Agent Abilities for LLMs, Zeng et al., ACL 2024. [paper][code]
-
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors, Chen et al., ICLR 2024. [paper][code]
-
AppAgent: Multimodal Agents as Smartphone Users, Zhang et al., arxiv 2023. [paper][code][digirl][Android-Lab]
-
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception, Wang et al., arxiv 2024. [paper][code][Mobile-Agent-v2][LiMAC]
-
OmniParser for Pure Vision Based GUI Agent, Lu et al., arxiv 2024. [paper][code][Agent-S][The Dawn of GUI Agent][ShowUI][TinyClick][Large Language Model-Brained GUI Agents: A Survey][Aguvis]
-
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security, Li et al., arxiv 2024. [paper][code]
-
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Wu et al., arxiv 2023. [paper][code][AG2][RD-Agent][TinyTroupe]
-
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society, Li et al., NeurIPS 2023. [paper][code][crab][oasis]
-
ChatDev: Communicative Agents for Software Development, Qian et al., ACL 2024. [paper][code][gpt-pilot][Scaling Large-Language-Model-based Multi-Agent Collaboration][ProactiveAgent]
-
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework, Hong et al., ICLR 2024 Oral. [paper][code]
-
ProAgent: From Robotic Process Automation to Agentic Process Automation, Ye et al., arxiv 2023. [paper][code]
-
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation, Luo et al., arxiv 2024. [paper][code]
-
Generative Agents: Interactive Simulacra of Human Behavior, Park et al., arxiv 2023. [paper][code][genagents][GPTeam]
-
CogAgent: A Visual Language Model for GUI Agents, Hong et al., CVPR 2024. [paper][code]
-
OpenAgents: An Open Platform for Language Agents in the Wild, Xie et al., arxiv 2023. [paper][code]
-
TaskWeaver: A Code-First Agent Framework, Qiao et al., arxiv 2023. [paper][code]
-
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge, Fan et al., NeurIPS 2022 Outstanding Paper. [paper][code]
-
Voyager: An Open-Ended Embodied Agent with Large Language Models, Wang et al., arxiv 2023. [paper][code][WebVoyager][OpenWebVoyager][PAE]
-
Eureka: Human-Level Reward Design via Coding Large Language Models, Ma et al., ICLR 2024. [paper][code][DrEureka]
-
LEGENT: Open Platform for Embodied Agents, Cheng et al., ACL 2024. [paper][code]
-
Mind2Web: Towards a Generalist Agent for the Web, Deng et al., NeurIPS 2023. [paper][code][AutoWebGLM]
-
WebArena: A Realistic Web Environment for Building Autonomous Agents, Zhou et al., ICLR 2024. [paper][code][visualwebarena][agent-workflow-memory][WindowsAgentArena]
-
SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded, Zheng et al., arxiv 2024. [paper][code][WebDreamer]
-
Learning to Model the World with Language, Lin et al., ICML 2024. [paper][code]
-
Cradle: Empowering Foundation Agents Towards General Computer Control, Tan et al., arxiv 2024. [paper][code]
-
AgentScope: A Flexible yet Robust Multi-Agent Platform, Gao et al., arxiv 2024. [paper][code][modelscope-agent]
-
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments, Xi et al., arxiv 2024. [paper][code]
-
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence, Chen et al., arxiv 2024. [paper][code][iAgents]
-
CLASI: Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent, ByteDance Research, 2024. [paper][translation-agent]
-
Automated Design of Agentic Systems, Hu et al., arxiv 2024. [paper][code][agent-zero][AgentK][AFlow: Automating Agentic Workflow Generation]
-
Foundation Models in Robotics: Applications, Challenges, and the Future, Firoozi et al., arxiv 2023. [paper][code][Awesome-Implicit-NeRF-Robotics]
-
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI, Liu et al., arxiv 2024. [paper][code]
-
RT-1: Robotics Transformer for Real-World Control at Scale, Brohan et al., arxiv 2022. [paper][code][IRASim]
-
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, Brohan et al., arxiv 2023. [paper][Unofficial Implementation][RT-H: Action Hierarchies Using Language]
-
Open X-Embodiment: Robotic Learning Datasets and RT-X Models, Open X-Embodiment Collaboration, arxiv 2023. [paper][code]
-
Shaping the future of advanced robotics, Google DeepMind 2024. [blog]
-
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, Wang et al., ICML 2024. [paper][code]
-
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation, Wu et al., ICLR 2024. [paper][code][Moto]
-
RL-GPT: Integrating Reinforcement Learning and Code-as-policy, Liu et al., arxiv 2024. [paper]
-
Genie: Generative Interactive Environments, Bruce et al., ICML 2024 Best Paper. [paper][GameNGen][GameGen-O][GameGen-X][Unbounded][open-oasis][DIAMOND]
-
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, Fu et al., arxiv 2024. [paper][code][Hardware Code][Learning Code][UMI][humanplus][TeleVision][Surgical Robot Transformer][lifelike-agility-and-play][ReKep][Open_Duck_Mini][Learning Visual Parkour from Generated Images]
-
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation, Liu et al., arxiv 2024. [paper][code]
-
Octo: An Open-Source Generalist Robot Policy, Ghosh et al., arxiv 2024. [paper][code][BodyTransformer][crossformer]
-
GRUtopia: Dream General Robots in a City at Scale, Wang et al., arxiv 2024. [paper][code]
-
HPT: Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers, Wang et al., NeurIPS 2024 Spotlight. [paper][code][GenSim]
-
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents, Yue et al., arxiv 2024. [paper]
-
[LeRobot][Genesis][DORA][awesome-ai-agents][IsaacLab][Awesome-Robotics-3D][AimRT][agibot_x1_train][unitree_IL_lerobot][unitree_rl_gym]
-
XAgent: An Autonomous Agent for Complex Task Solving, [blog][code]
-
[crewAI][phidata][PraisonAI][llama_deploy][gpt-computer-assistant][agentic_patterns]
-
[translation-agent][agent-zero][AgentK][Twitter Personality][RD-Agent][TinyTroupe]
-
Pangu Weather: Accurate medium-range global weather forecasting with 3D neural networks, Bi et al., Nature 2023. [paper][code][arxiv]
-
Skilful nowcasting of extreme precipitation with NowcastNet, Zhang et al., Nature 2023. [paper][code][graphcast][OpenCastKit][GenCast]
-
Galactica: A Large Language Model for Science, Taylor et al., arxiv 2022. [paper][code]
-
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization, Deng et al., arxiv 2023. [paper][code][pdf_parser]
-
GeoGalactica: A Scientific Large Language Model in Geoscience, Lin et al., arxiv 2024. [paper][code][sciparser]
-
EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities, Li et al., ACL 2024. [paper][code][Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives]
-
Scientific Large Language Models: A Survey on Biological & Chemical Domains, Zhang et al., arxiv 2024. [paper][code][sciknoweval]
-
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning, Zhang et al., arxiv 2024. [paper][code]
-
ChemLLM: A Chemical Large Language Model, Zhang et al., arxiv 2024. [paper][model]
-
LangCell: Language-Cell Pre-training for Cell Identity Understanding, Zhao et al., ICML 2024. [paper][code][scFoundation]
-
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers, Pramanick et al., arxiv 2024. [paper][code]
-
STORM: Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models, Shao et al., NAACL 2024. [paper][code][Co-STORM EMNLP 2024][WikiChat][kiroku]
-
Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis, Yu et al., arxiv 2024. [paper][code][model]
-
AutoSurvey: Large Language Models Can Automatically Write Surveys, Wang et al., NeurIPS 2024. [paper][code]
-
OpenResearcher: Unleashing AI for Accelerated Scientific Research, Zheng et al., arxiv 2024. [paper][code][Paper Copilot][SciAgentsDiscovery][paper-qa][GraphReasoning]
-
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs, Asai et al., arxiv 2024. [paper][code][research-rabbit]
-
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, Lu et al., arxiv 2024. [paper][code][Social_Science][SocialAgent][game_theory]
-
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers, Si et al., arxiv 2024. [paper][code]
-
CoI-Agent: Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents, Li et al., arxiv 2024. [paper][code]
-
[Awesome-Scientific-Language-Models][gpt_academic][ChatPaper][scispacy][awesome-ai4s][xVal]
-
Neural code generation, CMU 2024 Spring. [link]
-
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code, Zhang et al., arxiv 2023. [paper][Awesome-Code-LLM][MFTCoder][Awesome-Code-LLM][CodeFuse-muAgent]
-
Source Code Data Augmentation for Deep Learning: A Survey, Zhuo et al., arxiv 2023. [paper][code]
-
Codex: Evaluating Large Language Models Trained on Code, Chen et al., arxiv 2021. [paper][human-eval][CriticGPT][On scalable oversight with weak LLMs judging strong LLMs]
-
Code Llama: Open Foundation Models for Code, Rozière et al., arxiv 2023. [paper][code][model][llamacoder]
-
AlphaCode: Competition-Level Code Generation with AlphaCode, Li et al., arxiv 2022. [paper][dataset][AlphaCode2_Tech_Report]
-
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X, Zheng et al., KDD 2023. [paper][code][CodeGeeX2][CodeGeeX4]
-
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis, Nijkamp et al., ICLR 2022. [paper][code]
-
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages, Nijkamp et al., ICLR 2023. [paper][code]
-
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules, Le et al., arxiv 2023. [paper][code]
-
StarCoder: may the source be with you, Li et al., arxiv 2023. [paper][code][bigcode-project][model]
-
StarCoder 2 and The Stack v2: The Next Generation, Lozhkov et al., 2024. [paper][code][starcoder.cpp]
-
SelfCodeAlign: Self-Alignment for Code Generation, Wei et al., NeurIPS 2024. [paper][code]
-
WizardCoder: Empowering Code Large Language Models with Evol-Instruct, Luo et al., ICLR 2024. [paper][code]
-
Magicoder: Source Code Is All You Need, Wei et al., arxiv 2023. [paper][code]
-
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, Ridnik et al., arxiv 2024. [paper][code][pr-agent][cover-agent]
-
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence, Guo et al., arxiv 2024. [paper][code]
-
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence, Zhu et al., CoRR 2024. [paper][code][DeepSeek-V2.5]
-
Qwen2.5-Coder Technical Report, Hui et al., arxiv 2024. [paper][code][CodeArena]
-
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models, Huang et al., arxiv 2024. [paper][code][dataset][opc_data_filtering]
-
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents, Yang et al., arxiv 2024. [paper]
-
Design2Code: How Far Are We From Automating Front-End Engineering?, Si et al., arxiv 2024. [paper][code]
-
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct, Lei et al., arxiv 2024. [paper][code]
-
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, Yang et al., arxiv 2024. [paper][code][swe-bench-technical-report][CodeR][Lingma-SWE-GPT]
-
Agentless: Demystifying LLM-based Software Engineering Agents, Xia et al., arxiv 2024. [paper][code]
-
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions, Zhuo et al., arxiv 2024. [paper][code][LiveCodeBench][evalplus]
-
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents, Wang et al., arxiv 2024. [paper][code]
-
Planning In Natural Language Improves LLM Search For Code Generation, Wang et al., arxiv 2024. [paper][SRA-MCTS]
-
Large Language Model-Based Agents for Software Engineering: A Survey, Liu et al., arxiv 2024. [paper][code]
-
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale, Phan et al., arxiv 2024. [paper][code][Seeker][AutoKaggle][Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level]
-
CodeDPO: Aligning Code Models with Self Generated and Verified Source Code, Zhang et al., arxiv 2024. [paper]
-
FullStack Bench: Evaluating LLMs as Full Stack Coders, Liu et al., arxiv 2024. [paper][code][SandboxFusion]
-
o1-Coder: an o1 Replication for Coding, Zhang et al., arxiv 2024. [paper][code]
-
[OpenDevin][devika][auto-code-rover][developer][aider][claude-engineer][SuperCoder][AIDE][vulnhuntr]
-
[screenshot-to-code][vanna][NL2SQL_Handbook][TAG-Bench][Spider2]
-
DocLLM: A layout-aware generative language model for multimodal document understanding, Wang et al., arxiv 2024. [paper]
-
DocGraphLM: Documental Graph Language Model for Information Extraction, Wang et al., arxiv 2023. [paper]
-
FinBERT: A Pretrained Language Model for Financial Communications, Yang et al., arxiv 2020. [paper][Wiley paper][code][finBERT][valuesimplex/FinBERT]
-
FinGPT: Open-Source Financial Large Language Models, Yang et al., IJCAI 2023. [paper][code]
-
FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models, Yang et al., arxiv 2024. [paper][code]
-
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets, Wang et al., arxiv 2023. [paper][code]
-
Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models, Zhang et al., arxiv 2023. [paper][code]
-
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance, Liu et al., arxiv 2020. [paper][code]
-
FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning, Liu et al., NeurIPS 2022. [paper][code]
-
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning, Chen et al., arxiv 2023. [paper][code]
-
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist, Zhang et al., arxiv 2024. [paper]
-
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters, Zhang et al., arxiv 2023. [paper][code]
-
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications, Xie et al., arxiv 2024. [paper][code]
-
StructGPT: A General Framework for Large Language Model to Reason over Structured Data, Jiang et al., arxiv 2023. [paper][code]
-
Large Language Model for Table Processing: A Survey, Lu et al., arxiv 2024. [paper][llm-table-survey][table-transformer][Awesome-Tabular-LLMs][Awesome-LLM-Tabular][Table-LLaVA][tablegpt-agent]
-
rLLM: Relational Table Learning with LLMs, Li et al., arxiv 2024. [paper][code]
-
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow, Zhang et al., arxiv 2023. [paper][code]
-
Data Interpreter: An LLM Agent For Data Science, Hong et al., arxiv 2024. [paper][code]
-
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework, Li et al., COLING 2024. [paper][code]
-
LLMFactor: Extracting Profitable Factors through Prompts for Explainable Stock Movement Prediction, Wang et al., arxiv 2024. [paper][MIGA]
-
A Survey of Large Language Models in Finance (FinLLMs), Lee et al., arxiv 2024. [paper][code][Revolutionizing Finance with LLMs: An Overview of Applications and Insights]
-
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges, Nie et al., arxiv 2024. [paper][financial-datasets][LLMs-in-Finance]
-
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods, Wang et al., arxiv 2024. [paper][code][Stockagent]
-
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset, Zhu et al., ACL 2024. [paper][code][Golden-Touchstone][financebench][OmniEval]
-
[gpt-investor][FinGLM][agentUniverse][gs-quant][stockbot-on-groq][Real-Time-Stock-Market-Prediction-using-Ensemble-DL-and-Rainbow-DQN][openbb-agents][ai-hedge-fund]
-
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, Khattab et al., SIGIR 2020. [paper][simbert][roformer-sim]
-
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction, Santhanam et al., NAACL 2022. [paper][code][RAGatouille][A Reproducibility Study of PLAID][Jina-ColBERT-v2]
-
ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval, Louis et al., arxiv 2024. [paper][code][model]
-
NCI: A Neural Corpus Indexer for Document Retrieval, Wang et al., NeurIPS 2022 Outstanding Paper. [paper][code][DSI-transformers][GDR EACL 2024 Oral]
-
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels, Gao et al., ACL 2023. [paper][code]
-
Query2doc: Query Expansion with Large Language Models, Wang et al., EMNLP 2023. [paper][Query Expansion by Prompting Large Language Models]
-
RankGPT: Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents, Sun et al., EMNLP 2023 Outstanding Paper. [paper][code]
-
Large Language Models for Information Retrieval: A Survey, Zhu et al., arxiv 2023. [paper][code][YuLan-IR][A Survey of Conversational Search]
-
Large Language Models for Generative Information Extraction: A Survey, Xu et al., arxiv 2023. [paper][code][UIE][NERRE][uie_pytorch]
-
LLaRA: Making Large Language Models A Better Foundation For Dense Retrieval, Li et al., arxiv 2023. [paper][code]
-
UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models, Li et al., AAAI 2024. [paper]
-
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning, Zhu et al., ACL 2024. [paper][code][ChatRetriever]
-
GenIR: From Matching to Generation: A Survey on Generative Information Retrieval, Li et al., arxiv 2024. [paper][code]
-
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search, Liao et al., ACL 2024. [paper][code]
-
BM25S: Orders of magnitude faster lexical search via eager sparse scoring, Xing Han Lù, arxiv 2024. [paper][code][rank_bm25][pyserini]
-
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, Chen et al., arxiv 2024. [paper][code][Search Engines in an AI Era]
-
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines, Zhang et al., arxiv 2024. [paper][code][Smart Multi-Modal Search][M3DocRAG][Visualized BGE][OmniSearch][StreamRAG]
-
SIGIR-AP 2023 Tutorial: Recent Advances in Generative Information Retrieval [link]
-
SIGIR 2024 Tutorial: Large Language Model Powered Agents for Information Retrieval [link]
-
[search_with_lepton][LLocalSearch][FreeAskInternet][storm][searxng][Perplexica][rag-search][sensei][azure-search-openai-demo]
-
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, Gou et al., ICLR 2024. [paper][code]
-
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models, Yu et al., ICLR 2024. [paper][code][MathCoder]
-
MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models, Lu et al., ICLR 2024 Oral. [paper][code][MathBench][OlympiadBench]
-
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning, Ying et al., arxiv 2024. [paper][code]
-
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, Shao et al., arxiv 2024. [paper][code][DeepSeek-Prover-V1.5]
-
Common 7B Language Models Already Possess Strong Math Capabilities, Li et al., arxiv 2024. [paper][code]
-
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline, Xu et al., arxiv 2024. [paper][code]
-
AlphaMath Almost Zero: process Supervision without process, Chen et al., arxiv 2024. [paper][code]
-
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models, Zhou et al., NeurIPS 2024. [paper][code]
-
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B, Zhang et al., arxiv 2024. [paper][code][LLaMA-Berry]
-
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models, Shi et al., arxiv 2024. [paper][code]
-
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?, Qiao et al., arxiv 2024. [paper][code]
-
MAVIS: Mathematical Visual Instruction Tuning, Zhang et al., arxiv 2024. [paper][code]
-
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement, Yang et al., arxiv 2024. [paper][code][Qwen2.5-Math-Demo][ProcessBench][SuperCorrect-llm]
-
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models, Deng et al., arxiv 2024. [paper][code][alphageometry][MathCritique]
-
AI Mathematical Olympiad - Progress Prize 1, Kaggle Competition 2024. [Numina 1st Place Solution][project-numina/aimo-progress-prize][How NuminaMath Won the 1st AIMO Progress Prize][NuminaMath-7B-TIR][AI achieves silver-medal standard solving International Mathematical Olympiad problems]
-
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge, Zhou et al., arxiv 2023. [paper][code][LLM-for-Healthcare][GMAI-MMBench]
-
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law, Chen et al., arxiv 2024. [paper][code]
-
PMC-LLaMA: Towards Building Open-source Language Models for Medicine, Wu et al., arxiv 2024. [paper][code][MMedLM]
-
HuatuoGPT, towards Taming Language Model to Be a Doctor, Zhang et al., arxiv 2023. [paper][code][HuatuoGPT-II][Medical_NLP][Zhongjing][MedicalGPT][huatuogpt-vision][Chain-of-Diagnosis][BianCang]
-
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model, Cui et al., arxiv 2023. [paper][code][HK-O1aw]
-
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services, Yue et al., arxiv 2023. [paper][code]
-
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation, Bao et al., arxiv 2023. [paper][code]
-
BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT, Chen et al., arxiv 2023. [paper][code][SoulChat2.0][smile]
-
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning, Tang et al., arxiv 2023. [paper][code]
-
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models, Chen et al., arxiv 2023. [paper][meditron]
-
Med-PaLM: Large language models encode clinical knowledge, Singhal et al., Nature 2023. [paper][Unofficial Implementation]
-
Capabilities of Gemini Models in Medicine, Saab et al., arxiv 2024. [paper]
-
AMIE: Towards Conversational Diagnostic AI, Tu et al., arxiv 2024. [paper][AMIE-pytorch]
-
Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People, Wang et al., arxiv 2024. [paper][code][Medical_NLP]
-
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator, Fan et al., COLING 2025. [paper][code][Agent Hospital][MentalArena][MING]
-
AI-Press: A Multi-Agent News Generating and Feedback Simulation System Powered by Large Language Models, Liu et al., COLING 2025. [paper][code]
-
AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents, Chen et al., arxiv 2024. [paper][code]
-
On Domain-Specific Post-Training for Multimodal Large Language Models, Cheng et al., ICLR 2024. [paper][model]
-
[openfold][alphafold3-pytorch][Protenix][AlphaFold3][Ligo-Biosciences/AlphaFold3][LucaOne][esm][AlphaPPImd][visual-med-alpaca][chai-lab][evo]
-
DIN: Deep Interest Network for Click-Through Rate Prediction, Zhou et al., KDD 2018. [paper][code][DIEN][x-deeplearning]
-
MMoE: Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts, Ma et al., KDD 2018. [paper][DeepCTR-Torch][pytorch-mmoe]
-
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5), Geng et al., arxiv 2022. [paper][unofficial code][OpenP5]
-
Recommender Systems with Generative Retrieval, Rajput et al., NeurIPS 2023. [paper][Methodologies for Improving Modern Industrial Recommender Systems]
-
Unifying Large Language Models and Knowledge Graphs: A Roadmap, Pan et al., arxiv 2023. [paper]
-
YuLan-Rec: User Behavior Simulation with Large Language Model based Agents, Wang et al., arxiv 2023. [paper][code][Scaling Law of Large Sequential Recommendation Models]
-
SSLRec: A Self-Supervised Learning Framework for Recommendation, Ren et al., WSDM 2024 Oral. [paper][code][Awesome-SSLRec-Papers]
-
RLMRec: Representation Learning with Large Language Models for Recommendation, Ren et al., WWW 2024. [paper][code]
-
LLMRec: Large Language Models with Graph Augmentation for Recommendation, Wei et al., WSDM 2024 Oral. [paper][code][EasyRec]
-
XRec: Large Language Models for Explainable Recommendation, Ma et al., arxiv 2024. [paper][code][SelfGNN]
-
Agent4Rec_On Generative Agents in Recommendation, Zhang et al., arxiv 2023. [paper][code]
-
LLM-KERec: Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph, Zhao et al., arxiv 2024. [paper]
-
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Zhai et al., ICML 2024. [paper][code][Transformers4Rec]
-
Wukong: Towards a Scaling Law for Large-Scale Recommendation, Zhang et al., ICML 2024. [paper][unofficial code]
-
RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems, Lian et al., arxiv 2024. [paper][code]
-
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application, Jia et al., arxiv 2024. [paper][NAR4Rec][HoME][Long-Sequence Recommendation Models Need Decoupled Embeddings]
-
NoteLLM-2: Multimodal Large Representation Models for Recommendation, Zhang et al., NeurIPS 2024. [paper][NoteLLM]
-
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling, Chen et al., arxiv 2024. [paper][code]
-
STAR: A Simple Training-free Approach for Recommendations using Large Language Models, Lee et al., arxiv 2024. [paper]
-
[recommenders][Source code for Twitter's Recommendation Algorithm][Awesome-RSPapers][RecBole][RecSysDatasets][LLM4Rec-Awesome-Papers][Awesome-LLM-for-RecSys][Awesome-LLM4RS-Papers][ReChorus]
-
[fun-rec][RecommenderSystem][AI-RecommenderSystem][RecSysPapers][Algorithm-Practice-in-Industry][AlgoNotes]
-
Tool Learning with Foundation Models, Qin et al., arxiv 2023. [paper][code]
-
Tool Learning with Large Language Models: A Survey, Qu et al., arxiv 2024. [paper][code]
-
Toolformer: Language Models Can Teach Themselves to Use Tools, Schick et al., arxiv 2023. [paper][toolformer-pytorch][conceptofmind/toolformer][xrsrke/toolformer][Graph_Toolformer]
-
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, Qin et al., ICLR 2024 Spotlight. [paper][code][StableToolBench]
-
Gorilla: Large Language Model Connected with Massive APIs, Patil et al., arxiv 2023. [paper][code]
-
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al., NeurIPS 2023. [paper][code]
-
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction, Yang et al., arxiv 2023. [paper][code]
-
RestGPT: Connecting Large Language Models with Real-World RESTful APIs, Song et al., arxiv 2023. [paper][code]
-
LLMCompiler: An LLM Compiler for Parallel Function Calling, Kim et al., ICML 2024. [paper][code]
-
Large Language Models as Tool Makers, Cai et al, arxiv 2023. [paper][code]
-
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang et al., arxiv 2023. [paper][code][ToolQA][toolbench]
-
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search, Zhuang et al., arxiv 2023. [paper][[code]]
-
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models, Lu et al., NeurIPS 2023. [paper][code]
-
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios, Ye et al., arxiv 2024. [paper][code]
-
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls, Du et al., arxiv 2024. [paper][code]
-
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error, Wang et al., arxiv 2024. [paper][code]
-
What Are Tools Anyway? A Survey from the Language Model Perspective, Wang et al., arxiv 2024. [paper]
-
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities, Lu et al., arxiv 2024. [paper][code][API-Bank]
-
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval, Chen et al., arxiv 2024. [paper]
-
ToolACE: Winning the Points of LLM Function Calling, Liu et al., arxiv 2024. [paper][ToolGen]
-
Hammer: Robust Function-Calling for On-Device Language Models via Function Masking, Lin et al., arxiv 2024. [paper][code]
-
How to Train Really Large Models on Many GPUs, Lilian Weng, 2021. [blog]
-
Training great LLMs entirely from ground zero in the wilderness as a startup, Yi Tay, 2024. [blog][What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives][New LLM Pre-training and Post-training Paradigms]
-
Understanding LLMs: A Comprehensive Overview from Training to Inference, Liu et al., arxiv 2024. [paper]
-
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Shoeybi et al., arxiv 2019. [paper][code][GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism][Parameter Server OSDI 2014][megatron sequence parallelism][Scaling Language Model Training to a Trillion Parameters Using Megatron]
-
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Rajbhandari et al., arxiv 2019. [paper][DeepSpeed][FSDP][pytorch-fsdp]
-
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training, Li et al., ICPP 2023. [paper][code]
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs, Jiang et al., NSDI 2024. [paper][veScale][blog][Parameter Server OSDI 2014][ps-lite][ByteCheckpoint][HybridFlow]
-
A Theory on Adam Instability in Large-Scale Machine Learning, Molybog et al., arxiv 2023. [paper]
-
Loss Spike in Training Neural Networks, Zhang et al., arxiv 2023. [paper]
-
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling, Biderman et al., arxiv 2023. [paper][code]
-
Continual Pre-Training of Large Language Models: How to (re)warm your model, Gupta et al., [paper]
-
FLM-101B: An Open LLM and How to Train It with $100K Budget, Li et al., arxiv 2023. [paper][model][Tele-FLM]
-
Instruction Tuning with GPT-4, Peng et al., arxiv 2023. [paper][code]
-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, Khattab et al., arxiv 2023. [paper][code][textgrad][appl][okhat/blog]
-
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training, Feng et al., ICML 2024. [paper][code][Natural-language-RL]
-
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning, Ye et al., arxiv 2024. [paper][code]
-
Arcee's MergeKit: A Toolkit for Merging Large Language Models, Goddard et al., EMNLP 2024. [paper][code][DistillKit][A Survey on Collaborative Strategies in the Era of Large Language Models][FuseAI]
-
A Survey on Self-Evolution of Large Language Models, Tao et al., arxiv 2024. [paper][code]
-
Adam-mini: Use Fewer Learning Rates To Gain More, Zhang et al., arxiv 2024. [paper][code]
-
RouteLLM: Learning to Route LLMs with Preference Data, Ong et al., arxiv 2024. [paper][code][RouterDC]
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners, Cheng et al., arxiv 2024. [paper][code]
-
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training, Jaghouar et al., arxiv 2024. [paper][code][Prime][DiLoCo][DisTrO]
-
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models, Jin et al., arxiv 2024. [paper][code][jailbreak_llms][llm-attacks]
-
LLM-Pruner: On the Structural Pruning of Large Language Models, Ma et al. NeurIPS 2023. [paper][code][Awesome-Efficient-LLM]
-
LLM Pruning and Distillation in Practice: The Minitron Approach, Sreenivas et al., arxiv 2024. [paper][code][distillm][llm_distillation_playbook]
-
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning, An et al., arxiv 2024. [paper][code][Parameter Server OSDI 2014][ps-lite]
-
[wandb][aim][tensorboardX][nvitop]
-
AI Alignment: A Comprehensive Survey, Ji et al., arxiv 2023. [paper][PKU-Alignment][webpage]
-
Large Language Model Alignment: A Survey, Shen et al., arxiv 2023. [paper]
-
Aligning Large Language Models with Human: A Survey, Wang et al., arxiv 2023. [paper][code]
-
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More, Wang et al., arxiv 2024. [paper]
-
Towards a Unified View of Preference Learning for Large Language Models: A Survey, Gao et al., arxiv 2024. [paper][code]
-
Self-Instruct: Aligning Language Models with Self-Generated Instructions, Wang et al., ACL 2023. [paper][code][open-instruct][Multi-modal-Self-instruct][evol-instruct][MMEvol][Automatic Instruction Evolving for Large Language Models]
-
Self-Alignment with Instruction Backtranslation, Li et al., ICLR 2024. [paper][unofficial implementation]
-
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, Liu et al., ICLR 2024. [paper][code][From Quantity to Quality NAACL'24][Reformatted Alignment][MAmmoTH2: Scaling Instructions from the Web]
-
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing, Xu et al., arxiv 2024. [paper][code]
-
RLHF: [hf blog][OpenAI blog][alignment blog][awesome-RLHF]
-
Secrets of RLHF in Large Language Models [MOSS-RLHF][Part I][Part II]
-
Safe RLHF: Safe Reinforcement Learning from Human Feedback, Dai et al., ICLR 2024 Spotlight. [paper][code][align-anything][Safe-Policy-Optimization]
-
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization, Huang et al., arxiv 2024. [paper][code][blog][trl][trlx]
-
RLHF Workflow: From Reward Modeling to Online RLHF, Dong et al., arxiv 2024. [paper][code]
-
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework, Hu et al., arxiv 2024. [paper][code]
-
LIMA: Less Is More for Alignment, Zhou et al., NeurIPS 2023. [paper]
-
DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafailov et al., NeurIPS 2023 Runner-up Award. [paper][Unofficial Implementation][trl][dpo_trainer]
-
BPO: Black-Box Prompt Optimization: Aligning Large Language Models without Model Training, Cheng et al., arxiv 2023. [paper][code]
-
KTO: Model Alignment as Prospect Theoretic Optimization, Ethayarajh et al., arxiv 2024. [paper][code]
-
ORPO: Monolithic Preference Optimization without Reference Model, Hong et al., EMNLP 2024. [paper][code][GRPO]
-
TDPO: Token-level Direct Preference Optimization, Zeng et al., arxiv 2024. [paper][code][Step-DPO][FineGrainedRLHF][MCTS-DPO][Critical Tokens Matter]
-
SimPO: Simple Preference Optimization with a Reference-Free Reward, Meng et al., arxiv 2024. [paper][code]
-
Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]
-
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Lee et al., arxiv 2023. [paper][[code]][awesome-RLAIF]
-
Direct Language Model Alignment from Online AI Feedback, Guo et al., arxiv 2024. [paper]
-
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models, Li et al., ICML 2024. [paper][code][policy_optimization]
-
Zephyr: Direct Distillation of LM Alignment, Tunstall et al., arxiv 2023. [paper][code]
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision, Burns et al., arxiv 2023. [paper][code][weak-to-strong-deception][Evolving Alignment via Asymmetric Self-Play]
-
SPIN: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, Chen et al., arxiv 2024. [paper][code][unofficial implementation]
-
SPPO: Self-Play Preference Optimization for Language Model Alignment, Wu et al., arxiv 2024. [paper][code][A Survey on Self-play Methods in Reinforcement Learning]
-
CALM: LLM Augmented LLMs: Expanding Capabilities through Composition, Bansal et al., arxiv 2024. [paper][CALM-pytorch]
-
Self-Rewarding Language Models, Yuan et al., arxiv 2024. [paper][unofficial implementation][Meta-Rewarding Language Models][Self-Taught Evaluators]
-
Anthropic: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, Hubinger et al., arxiv 2024. [paper]
-
LongAlign: A Recipe for Long Context Alignment of Large Language Models, Bai et al., arxiv 2024. [paper][code]
-
Aligner: Efficient Alignment by Learning to Correct, Ji et al., NeurIPS 2024 Oral. [paper][code]
-
A Survey on Knowledge Distillation of Large Language Models, Xu et al., arxiv 2024. [paper][code]
-
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment, Shen et al., arxiv 2024. [paper][code][Nemotron-4 340B Technical Report][Mistral NeMo][SparseLLM][MaskLLM][HelpSteer2-Preference]
-
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Ni et al., arxiv 2024. [paper][code]
-
Towards Scalable Automated Alignment of LLMs: A Survey, Cao et al., arxiv 2024. [paper][code]
-
Putting RL back in RLHF, Huang and Ahmadian, 2024. [blog]
-
Prover-Verifier Games improve legibility of language model outputs, Kirchner et al., 2024. [blog][paper]
-
Rule Based Rewards for Language Model Safety, Mu et al., OpenAI 2024. [blog][paper][code]
-
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning, Zhao et al., arxiv 2024. [paper][code][prompt2model]
- ALiBi: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation, Press et al., ICLR 2022. [paper][code]
- Positional Interpolation: Extending Context Window of Large Language Models via Positional Interpolation, Chen et al., arxiv 2023. [paper]
- Scaling Transformer to 1M tokens and beyond with RMT, Bulatov et al., AAAI 2024. [paper][code][LM-RMT]
- RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text, Zhou et al., arxiv 2023. [paper][code]
- LongNet: Scaling Transformers to 1,000,000,000 Tokens, Ding et al., arxiv 2023. [paper][code][unofficial code]
- Focused Transformer: Contrastive Training for Context Scaling, Tworkowski et al., NeurIPS 2023. [paper][code]
- LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models, Chen et al., ICLR 2024 Oral. [paper][code]
- StreamingLLM: Efficient Streaming Language Models with Attention Sinks, Xiao et al., ICLR 2024. [paper][code][SwiftInfer][SwiftInfer blog]
- YaRN: Efficient Context Window Extension of Large Language Models, Peng et al., ICLR 2024. [paper][code][LM-Infinite]
- Ring Attention with Blockwise Transformers for Near-Infinite Context, Liu et al., ICLR 2024. [paper][code][ring-attention-pytorch][local-attention][tree_attention]
- LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression, Jiang et al., ACL 2024. [paper][code]
- LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, Ding et al., arxiv 2024. [paper][code]
- LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, Jin et al., arxiv 2024. [paper][code]
- The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey, Pawar et al., arxiv 2024. [paper][Awesome-LLM-Long-Context-Modeling]
- Data Engineering for Scaling Language Models to 128K Context, Fu et al., arxiv 2024. [paper][code]
- CEPE: Long-Context Language Modeling with Parallel Context Encoding, Yen et al., ACL 2024. [paper][code]
- Training-Free Long-Context Scaling of Large Language Models, An et al., ICML 2024. [paper][code]
- InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory, Xiao et al., NeurIPS 2024. [paper][code]
- Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models, Song et al., arxiv 2024. [paper][code][LLMTest_NeedleInAHaystack][RULER][LooGLE][LongBench][google-deepmind/loft]
- Infini-Transformer: Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, Munkhdalai et al., arxiv 2024. [paper][infini-transformer-pytorch][InfiniTransformer][infini-mini-transformer][megalodon]
- Extending Llama-3's Context Ten-Fold Overnight, Zhang et al., arxiv 2024. [paper][code][activation_beacon]
- Make Your LLM Fully Utilize the Context, An et al., arxiv 2024. [paper][code]
- CoPE: Contextual Position Encoding: Learning to Count What's Important, Golovneva et al., arxiv 2024. [paper][rope_cope]
- Scaling Granite Code Models to 128K Context, Stallone et al., arxiv 2024. [paper][code][granite-3.1-language-models]
- Generalizing an LLM from 8k to 1M Context using Qwen-Agent, Qwen Team, 2024. [blog]
- LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs, Bai et al., arxiv 2024. [paper][code][LongCite][LongReward]
- A failed experiment: Infini-Attention, and why we should keep trying, HuggingFace Blog, 2024. [blog][Magic Blog]
- Why Does the Effective Context Length of LLMs Fall Short, An et al., arxiv 2024. [paper][code][rotary-embedding-torch]
- How to Train Long-Context Language Models (Effectively), Gao et al., arxiv 2024. [paper][code]
-
*Thinking about High-Quality Human Data, Lilian Weng, 2024. [blog]
-
C4: Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus, Dodge et al., arxiv 2021. [paper][dataset][bookcorpus][the-pile]
-
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset, Laurençon et al., NeurIPS 2023. [paper][code][dataset]
-
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only, Penedo et al., arxiv 2023. [paper][dataset]
-
Data-Juicer: A One-Stop Data Processing System for Large Language Models, Chen et al., arxiv 2023. [paper][code]
-
UltraChat: Enhancing Chat Language Models by Scaling High-quality Instructional Conversations, Ding et al., EMNLP 2023. [paper][code][ultrachat]
-
UltraFeedback: Boosting Language Models with High-quality Feedback, Cui et al., ICML 2024. [paper][code][UltraInteract_sft]
-
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, Liu et al., ICLR 2024. [paper][code]
-
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset, Qiu et al., arxiv 2024. [paper][dataset][CCI3.0-HQ][LabelLLM][labelU][MinerU][PDF-Extract-Kit]
-
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research, Soldaini et al., ACL 2024. [paper][code][OLMo]
-
Datasets for Large Language Models: A Comprehensive Survey, Liu et al., arxiv 2024. [paper][Awesome-LLMs-Datasets]
-
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows, Patel et al., arxiv 2024. [paper][code]
-
Large Language Models for Data Annotation: A Survey, Tan et al., arxiv 2024. [paper][code]
-
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance, Ye et al., arxiv 2024. [paper][code]
-
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning, Bai et al., arxiv 2024. [paper][dataset]
-
Best Practices and Lessons Learned on Synthetic Data for Language Models, Liu et al., arxiv 2024. [paper][A Survey on Data Synthesis and Augmentation for Large Language Models]
-
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale, HuggingFace, 2024. [paper][blogpost][fineweb][fineweb-edu]
-
DataComp: In search of the next generation of multimodal datasets, Gadre et al., arxiv 2023. [paper][code]
-
DataComp-LM: In search of the next generation of training sets for language models, Li et al., arxiv 2024. [paper][code][apple/DCLM-7B-8k][data-agora]
-
Scaling Synthetic Data Creation with 1,000,000,000 Personas, Chan et al., arxiv 2024. [paper][code]
-
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale, Zhou et al., arxiv 2024. [paper][code]
-
MinerU: An Open-Source Solution for Precise Document Content Extraction, Wang et al., arxiv 2024. [paper][code][PDF-Extract-Kit][DocLayout-YOLO][OmniDocBench][Document Parsing Unveiled]
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models, Lai et al., arxiv 2024. [paper][BLIP]
-
[RedPajama-Data][xland-minigrid-datasets][OmniCorpus][dclm][Infinity-Instruct][MNBVC][LMSYS-Chat-1M]
-
[evaluation-guidebook][Awesome-LLM-Eval][LLM-eval-survey][llm_benchmarks][Awesome-LLMs-Evaluation-Papers]
-
MMLU: Measuring Massive Multitask Language Understanding, Hendrycks et al., ICLR 2021. [paper][code]
-
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks, Wang et al., EMNLP 2022. [paper][code]
-
HELM: Holistic Evaluation of Language Models, Liang et al., arxiv 2022. [paper][code]
-
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, Zheng et al., arxiv 2023. [paper][code]
-
SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark, Xu et al., arxiv 2023. [paper][code][SuperCLUE-RAG]
-
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models, Huang et al., NeurIPS 2023. [paper][code][chinese-llm-benchmark]
-
CMMLU: Measuring massive multitask language understanding in Chinese, Li et al., arxiv 2023. [paper][code]
-
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark, Zhang et al., arxiv 2024. [paper][code]
-
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference, Chiang et al., ICML 2024. [paper][demo][Challenges in Trustworthy Human Evaluation of Chatbots]
-
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models, Kim et al., EMNLP 2024. [paper][code][prometheus][prometheus-vision]
-
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models, Zhang et al., arxiv 2024. [paper][code]
-
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark, Yue et al., arxiv 2024. [paper][code]
-
Law of the Weakest Link: Cross Capabilities of Large Language Models, Zhong et al., arxiv 2024. [paper][code]
-
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering, Chan et al., arxiv 2024. [paper][code][swarm]
- Extrinsic Hallucinations in LLMs, Lilian Weng, 2024. [blog]
- Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models, Zhang et al., arxiv 2023. [paper][code]
- A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, Huang et al., arxiv 2023. [paper][code][Awesome-MLLM-Hallucination]
- The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models, Li et al., arxiv 2024. [paper][code]
- FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios, Chem et al., arxiv 2023. [paper][code][OlympicArena][FActScore]
- Chain-of-Verification Reduces Hallucination in Large Language Models, Dhuliawala et al., arxiv 2023. [paper][code]
- HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models, Guan et al., CVPR 2024. [paper][code]
- Woodpecker: Hallucination Correction for Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][code]
- OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation, Huang et al., CVPR 2024 Highlight. [paper][code]
- TrustLLM: Trustworthiness in Large Language Models, Sun et al., arxiv 2024. [paper][code]
- SAFE: Long-form factuality in large language models, Wei et al., arxiv 2024. [paper][code]
- RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models, Hu et al., arxiv 2024. [paper][code][HaluAgent][LLMsKnow]
- Detecting hallucinations in large language models using semantic entropy, Farquhar et al., Nature 2024. [paper][semantic_uncertainty][long_hallucinations][Semantic Uncertainty ICLR 2023][Lynx-hallucination-detection]
- A Survey on the Honesty of Large Language Models, Li et al., arxiv 2024. [paper][code]
- LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations, Orgad et al., arxiv 2024. [paper][code]
-
How to make LLMs go fast, 2023. [blog]
-
A Visual Guide to Quantization, 2024. [blog]
-
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, Miao et al., arxiv 2023. [paper][Awesome-Quantization-Papers][awesome-model-quantization][qllm-eval]
-
Full Stack Optimization of Transformer Inference: a Survey, Kim et al., arxiv 2023. [paper]
-
A Survey on Efficient Inference for Large Language Models, Zhou et al., arxiv 2024. [paper]
-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale, Dettmers et al., NeurIPS 2022. [paper][code]
-
LLM-FP4: 4-Bit Floating-Point Quantized Transformers, Liu et al., arxiv 2023. [paper][code]
-
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models, Shao et al., ICLR 2024 Spotlight. [paper][code][EfficientQAT][smoothquant][ABQ-LLM][VPTQ][ppq]
-
BitNet: Scaling 1-bit Transformers for Large Language Models, Wang et al., arxiv 2023. [paper][code][microsoft/BitNet][unofficial implementation][BitNet-Transformers][BitNet b1.58][BitNet a4.8][T-MAC][BitBLAS][BiLLM]
-
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, Frantar et al., ICLR 2023. [paper][code][AutoGPTQ][llmc]
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models, Frantar et al., arxiv 2023. [paper][code]
-
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, Lin et al., arxiv 2023. [paper][code][AutoAWQ][qserve]
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory, Alizadeh et al., arxiv 2023. [paper][air_llm]
-
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models, Jiang et al., EMNLP 2023. [paper][code]
-
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU, Sheng et al., ICML 2023. [paper][code]
-
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU, Song et al., arxiv 2023. [paper][code][llama.cpp][airllm][PowerInfer-2]
-
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, Dao et al., NeurIPS 2022. [paper][code]
-
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning, Tri Dao, ICLR 2024. [paper][code][xformers][SageAttention]
-
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision, Shah et al., arxiv 2024. [paper][code]
-
vllm: Efficient Memory Management for Large Language Model Serving with PagedAttention, Kwon et al., arxiv 2023. [paper][code][FastChat][Nanoflow][ollama]
-
SGLang: Fast and Expressive LLM Inference with RadixAttention and SGLang, Zheng et al., Stanford blog 2024. [blog][paper][code]
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads, Cai et al., ICML 2024. [paper][code]
-
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty, Li et al., ICML 2024. [paper][code][LLMSpeculativeSampling][Sequoia][HASS]
-
ReDrafter: Recurrent Drafter for Fast Speculative Decoding in Large Language Models, Cheng et al., arxiv 2024. [blog][paper][code]
-
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding, Xia et al., arxiv 2024. [paper][code][Spec-Bench]
-
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding, Liu et al., arxiv 2024. [paper][[code]][Ouroboros]
-
CLLMs: Consistency Large Language Models, Kou et al., ICML 2024. [paper][code][LookaheadDecoding][Lookahead]
-
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention, Jiang et al., arxiv 2024. [paper][code]
-
Sarathi-Serve: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve, Agrawal et al., OSDI 2024. [paper][code][ORCA OSDI 2022][continuous batching blog]
-
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving, Zhong et al., OSDI 2024. [paper][code]
-
Prompt Cache: Modular Attention Reuse for Low-Latency Inference, Gim et al., ICML 2024. [paper][code]
-
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention, Brandon et al., arxiv 2024. [paper][YOCO]
-
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving, Qin et al., arxiv 2024. [paper][code][ktransformers]
-
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads, Xiao et al., arxiv 2024. [paper][code][Star-Attention]
-
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models, Dong et al., arxiv 2024. [paper][code][mlc-llm]
-
[TensorRT-LLM][FasterTransformer][TritonServer][GenerativeAIExamples][TensorRT-Model-Optimizer][TensorRT][OpenVINO]
-
[text-generation-inference][quantization][optimum-quanto][huggingface-inference-toolkit][torchao]
-
[OpenLLM][mlc-llm][ollama][open-webui][torchchat]
-
[ggml][exllamav2][llama.cpp][gpt-fast][lightllm][fastllm][CTranslate2][ipex-llm][rtp-llm][KsanaLLM][ppl.nn][ZhiLight][WeChat-TFCC]
-
Mixture of Experts Explained, Sanseviero et al., Hugging Face Blog 2023. [blog]
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, Shazeer et al., arxiv 2017. [paper][Re-Implementation]
-
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Lepikhin et al., arxiv 2020. [paper][mixture-of-experts]
-
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts, Gale et al., arxiv 2022. [paper][code]
-
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models, Shen et al., arxiv 2023. [paper][[code]]
-
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, Fedus et al., arxiv 2021. [paper][code]
-
Fast Inference of Mixture-of-Experts Language Models with Offloading, Eliseev and Mazur, arxiv 2023. [paper][code]
-
Mixtral-8×7B: Mixtral of Experts, Jiang et al., arxiv 2023. [paper][code][megablocks-public][model][blog][Chinese-Mixtral-8x7B][Chinese-Mixtral]
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models, Dai et al., ACL 2024. [paper][code]
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, DeepSeek-AI, arxiv 2024. [paper][code][DeepSeek-V2.5]
-
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models, Wang et al., ACL 2024. [paper][code][Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts]
-
Evolutionary Optimization of Model Merging Recipes, Akiba et al., arxiv 2024. [paper][code]
-
A Closer Look into Mixture-of-Experts in Large Language Models, Lo et al., arxiv 2024. [paper][code]
-
A Survey on Mixture of Experts, Cai et al., arxiv 2024. [paper][code]
-
HMoE: Heterogeneous Mixture of Experts for Language Modeling, Wang et al., arxiv 2024. [paper][Configurable Foundation Models: Building LLMs from a Modular Perspective]
-
OLMoE: Open Mixture-of-Experts Language Models, Muennighoff et al., ICLR 2025. [paper][code]
-
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent, Sun et al., arxiv 2024. [paper][code]
-
[llama-moe][Aurora][OpenMoE][makeMoE][PEER-pytorch][GRIN-MoE][MoE-plus-plus][MoH]
-
[PEFT][trl][accelerate][LLaMA-Factory][LMFlow][unsloth][xtuner][MFTCoder][llm-foundry][ms-swift][Liger-Kernel][autotrain-advanced]
-
LoRA: Low-Rank Adaptation of Large Language Models, Hu et al., ICLR 2022. [paper][code][LoRA From Scratch][lora][dora][MoRA][ziplora-pytorch][alpaca-lora]
-
QLoRA: Efficient Finetuning of Quantized LLMs, Dettmers et al., NeurIPS 2023 Oral. [paper][code][bitsandbytes][unsloth][ir-qlora][fsdp_qlora]
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters, Sheng et al., arxiv 2023. [paper][code][AdaLoRA][LoRAMoE][lorahub][O-LoRA][qa-lora]
-
LoRA-GA: Low-Rank Adaptation with Gradient Approximation, Wang et al., arxiv 2024. [paper][code][LoRA-Pro blog][dora]
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection, Zhao et al., arxiv 2024. [paper][code][Q-GaLore][WeLore][Fira]
-
Prefix-Tuning: Optimizing Continuous Prompts for Generation, Li et al., ACL 2021. [paper][code]
-
Adapter: Parameter-Efficient Transfer Learning for NLP, Houlsby et al., ICML 2019. [paper][code][unify-parameter-efficient-tuning]
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning, Poth et al., EMNLP 2023. [paper][code][A Survey on LoRA of Large Language Models]
-
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models, Hu et al., EMNLP 2023. [paper][code]
-
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, Zhang et al., ICLR 2024. [paper][code]
-
LLaMA Pro: Progressive LLaMA with Block Expansion, Wu et al., arxiv 2024. [paper][code]
-
P-Tuning: GPT Understands, Too, Liu et al., arxiv 2021. [paper][code]
-
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, Liu et al., ACL 2022. [paper][code][pet][PrefixTuning]
-
Towards a Unified View of Parameter-Efficient Transfer Learning, He et al., ICLR 2022. [paper][code]
-
Parameter-efficient fine-tuning of large-scale pre-trained language models, Ding et al., Nature Machine Intelligence 2023. [paper][code]
-
Mixed Precision Training, Micikevicius et al., ICLR 2018. [paper]
-
8-bit Optimizers via Block-wise Quantization Dettmers et al., ICLR 2022. [paper][code]
-
FP8-LM: Training FP8 Large Language Models Peng et al., arxiv 2023. [paper][code]
-
NEFTune: Noisy Embeddings Improve Instruction Finetuning, Jain et al., ICLR 2024. [paper][code][NoisyTune][transformer_arithmetic]
-
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, Han et al., arxiv 2024. [paper]
-
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models, Diao et al., NAACL 2024. [paper][code]
-
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models, Zheng et al., ACL 2024. [paper][code]
-
ReFT: Representation Finetuning for Language Models, Wu et al., arxiv 2024. [paper][code]
-
OpenPrompt: An Open-source Framework for Prompt-learning, Ding et al., arxiv 2021. [paper][code]
-
Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning, Su et al., arxiv 2022. [paper]
-
Large Language Models Are Human-Level Prompt Engineers, Zhou et al., ICLR 2023. [paper][code]
-
Large Language Models as Optimizers, Yang et al., arxiv 2023. [paper][code]
-
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4, Bsharat et al., arxiv 2023. [paper][code]
-
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding, Suzgun and Kalai, arxiv 2024. [paper][code][docs]
-
AutoPrompt: Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases, Levi et al., arxiv 2024. [paper][code][automatic_prompt_engineer][appl][sammo][prompt-poet][ell]
-
LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language, Wang et al., arxiv 2024. [paper][code]
-
The Prompt Report: A Systematic Survey of Prompting Techniques, Schulhoff et al., arxiv 2024. [paper][code][A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks][A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications]
-
[PromptPapers][OpenAI Docs][ChatGPT Prompt Engineering for Developers][Prompt Engineering Guide][k12promptguide][gpt-prompt-engineer][awesome-chatgpt-prompts][awesome-chatgpt-prompts-zh][Prompt_Engineering]
-
The Power of Scale for Parameter-Efficient Prompt Tuning, Lester et al., EMNLP 2021. [paper][code][soft-prompt-tuning][Prompt-Tuning]
-
A Survey on In-context Learning, Dong et al., EMNLP 2024. [paper][code]
-
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work, Min et al., EMNLP 2022. [paper][code]
-
Larger language models do in-context learning differently, Wei et al., arxiv 2023. [paper]
-
PAL: Program-aided Language Models, Gao et al., ICML 2023. [paper][code]
-
A Comprehensive Survey on Instruction Following, Lou et al., arxiv 2023. [paper][code]
-
RLHF: Deep reinforcement learning from human preferences, Christiano et al., NIPS 2017. [paper]
-
RLHF: Fine-Tuning Language Models from Human Preferences, Ziegler et al., arxiv 2019. [paper][code][lm-human-preference-details]
-
RLHF: Learning to summarize from human feedback, Stiennon et al., NeurIPS 2020. [paper][code]
-
RLHF: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]
-
Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper]
-
Instruction Tuning for Large Language Models: A Survey, Zhang et al., arxiv 2023. [paper][code]
-
What learning algorithm is in-context learning? Investigations with linear models, Akyürek et al., ICLR 2023. [paper]
-
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers, Dai et al., arxiv 2022. [paper][code]
-
Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al., arxiv 2023. [paper][code][Modular RAG]
-
Retrieval-Augmented Generation for AI-Generated Content: A Survey, Zhao et al., arxiv 2024. [paper][code]
-
A Survey on Retrieval-Augmented Text Generation for Large Language Models, Huang et al., arxiv 2024. [paper][Retrieval-Augmented Generation for Natural Language Processing: A Survey][A Survey on RAG Meeting LLMs][A Comprehensive Survey of Retrieval-Augmented Generation]
-
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, Hu et al., arxiv 2024. [paper][code]
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., NeurIPS 2020. [paper][code][model][docs][FAISS]
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Asai et al., ICLR 2024 Oral. [paper][code][CRAG][Golden-Retriever]
-
Dense Passage Retrieval for Open-Domain Question Answering, Karpukhin et al., EMNLP 2020. [paper][code]
-
Internet-Augmented Dialogue Generation Komeili et al., arxiv 2021. [paper]
-
RETRO: Improving language models by retrieving from trillions of tokens, Borgeaud et al., arxiv 2021. [paper][RETRO-pytorch]
-
FLARE: Active Retrieval Augmented Generation, Jiang et al., EMNLP 2023. [paper][code]
-
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, Vu et al., arxiv 2023. [paper][code]
-
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, Yu et al., EMNLP 2024. [paper]
-
Learning to Filter Context for Retrieval-Augmented Generation, Wang et al., arxiv 2023. [paper][code]
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval, Sarthi et al., ICLR 2024. [paper][code][tree2retriever][TrustRAG]
-
When Large Language Models Meet Vector Databases: A Survey, Jing et al., arxiv 2024. [paper][A Comprehensive Survey on How to Make your LLMs use External Data More Wisely]
-
RAFT: Adapting Language Model to Domain Specific RAG, Zhang et al., arxiv 2024. [paper][code]
-
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback, Liu et al., arxiv 2024. [paper][code]
-
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation, Chan et al., arxiv 2024. [paper][code][Adaptive-RAG][Advanced RAG 11: Query Classification and Refinement]
-
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers, Sawarkar et al., arxiv 2024. [paper][code][infinity]
-
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, Jin et al., arxiv 2024. [paper][code][FlashRAG-Paddle][Auto-RAG]
-
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, Gutiérrez et al., NeurIPS 2024. [paper][code]
-
From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Edge et al., arxiv 2024. [paper][code][GraphRAG-Local-UI][nano-graphrag][fast-graphrag][graph-rag][llm-graph-builder][Triplex][knowledge_graph_maker][itext2kg][KG_RAG]
-
LightRAG: Simple and Fast Retrieval-Augmented Generation, Guo et al., arxiv 2024. [paper][code]
-
Graph Retrieval-Augmented Generation: A Survey, Peng et al., arxiv 2024. [paper]
-
Searching for Best Practices in Retrieval-Augmented Generation, Wang et al., arxiv 2024. [paper][code][Seven Failure Points When Engineering a Retrieval Augmented Generation System][Improving Retrieval Performance in RAG Pipelines with Hybrid Search][15 Advanced RAG Techniques from Pre-Retrieval to Generation]
-
Self-Reasoning: Improving Retrieval Augmented Language Model with Self-Reasoning, Xia et al., arxiv 2024. [paper]
-
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, Fleischer et al., arxiv 2024. [paper][code][fastRAG][rag-retrieval-study]
-
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, Zhu et al., arxiv 2024. [paper][ragas][RAGChecker][rageval][CORAL]
-
A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning, Yuan et al., arxiv 2024. [paper][code][ind_kdd_2024/][KDD2024-WhoIsWho-Top3]
-
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery, Qian et al., arxiv 2024. [paper][code][mem0][Memary][MemoryScope][memoripy]
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems, Tan et al., arxiv 2024. [paper][code]
-
Introducing Contextual Retrieval, Anthropic, 2024. [blog]
-
ColPali: Efficient Document Retrieval with Vision Language Models, Faysse et al., arxiv 2024. [paper][code][M3DocRAG][Visualized BGE][OmniSearch][StreamRAG]
-
ACL 2023 Tutorial: Retrieval-based Language Models and Applications, Asai et al., ACL 2023. [link]
-
[Advanced RAG Techniques: an Illustrated Overview][Chinese Version][RAG_Techniques][Controllable-RAG-Agent][GenAI_Agents][bRAG-langchain][GenAI-Showcase]
-
[LlamaIndex][llama_deploy][A Cheat Sheet and Some Recipes For Building Advanced RAG][Fine-Tuning Embeddings for RAG with Synthetic Data]
-
[ragas]
-
Browse the web with GPT-4V and Vimium [vimGPT]
-
[QAnything][ragflow][fastRAG][anything-llm][FastGPT][mem0][Memary]
-
[trt-llm-rag-windows][history_rag][gpt-crawler][R2R][rag-notebook-to-microservices][MaxKB][Verba][cognita][quivr][kotaemon][RAGMeUp]
-
[RAG-Retrieval][FlashRank][rank_bm25][PGRAG][CRUD_RAG][PlanRAG][DPA-RAG][FollowRAG][LongRAG][structured-rag][RAGLab][autogluon-rag][VARAG][PAI-RAG][Meta-Chunking][chonkie][RagVL][KAG][AutoRAG][RetroLLM]
-
[PDF-Extract-Kit][MinerU][colpali][localGPT-Vision][mPLUG-DocOwl]
-
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models, Thakur et al., NeurIPS 2021. [paper][code]
-
MTEB: Massive Text Embedding Benchmark, Muennighoff et al., arxiv 2022. [paper][code][leaderboard]
-
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Reimers et al., EMNLP 2019. [paper][code][model][vec2text]
-
SimCSE: Simple Contrastive Learning of Sentence Embeddings, Gao et al., EMNLP 2021. [paper][code][AnglE ACL 2024]
-
OpenAI: Text and Code Embeddings by Contrastive Pre-Training, Neelakantan et al., arxiv 2022. [paper][blog]
-
MRL: Matryoshka Representation Learning, Kusupati et al., NeurIPS 2022. [paper][code]
-
BGE: C-Pack: Packaged Resources To Advance General Chinese Embedding, Xiao et al., SIGIR 2024. [paper][code][bge reranker][FlagEmbedding]
-
LLM-Embedder: Retrieve Anything To Augment Large Language Models, Zhang et al., arxiv 2023. [paper][code][ACL 2024][llm_reranker][FlagEmbedding]
-
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation, Chen et al., ACL 2024. [paper][code][FlagEmbedding][blog]
-
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval, Zhou et al., arxiv 2024. [paper][code]
-
BGE-ICL: Making Text Embedders Few-Shot Learners, Li et al., arxiv 2024. [paper][code]
-
[m3e-base][acge_text_embedding][xiaobu-embedding-v2][stella_en_1.5B_v5][Conan-embedding-v1]
-
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents, Günther et al., arxiv 2023. [paper][jina-embeddings-v2][jina-reranker-v2][pe_rank][Jina CLIP][jina-embeddings-v3]
-
GTE: Towards General Text Embeddings with Multi-stage Contrastive Learning, Li et al., arxiv 2023. [paper][model][gte-Qwen2-7B-instruct][gte-large-en-v1.5]
-
[CohereV3]
-
One Embedder, Any Task: Instruction-Finetuned Text Embeddings, Su et al., ACL 2023. [paper][code]
-
E5: Improving Text Embeddings with Large Language Models, Wang et al., ACL 2024. [paper][code][model][llm2vec]
-
Nomic Embed: Training a Reproducible Long Context Text Embedder, Nussbaum et al., Nomic AI 2024. [paper][code]
-
GritLM: Generative Representational Instruction Tuning, Muennighoff et al., arxiv 2024. [paper][code][OneGen]
-
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, BehnamGhader et al., arxiv 2024. [paper][code]
-
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models, Lee et al., arxiv 2024. [paper][model]
-
PE-Rank: Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models, Liu et al., arxiv 2024. [paper][code]
-
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs, Lin et al., arxiv 2024. [paper][modeling_nvmmembed.py][magiclens][E5-V][Visualized BGE]
-
Few-Shot-CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al., NeurIPS 2022. [paper][chain-of-thought-hub]
-
Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al., ICLR 2023. [paper]
-
Zero-Shot-CoT: Large Language Models are Zero-Shot Reasoners, Kojima et al., NeurIPS 2022. [paper][code]
-
Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models, Zhang et al., ICLR 2023. [paper][code]
-
Multimodal Chain-of-Thought Reasoning in Language Models, Zhang et al., arxiv 2023. [paper][code]
-
Fine-tune-CoT: Large Language Models Are Reasoning Teachers, Ho et al., ACL 2023. [paper][code]
-
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning, Kim et al., EMNLP 2023. [paper][code]
-
Chain-of-Thought Reasoning Without Prompting, Wang et al., arxiv 2024. [paper]
-
ReAct: Synergizing Reasoning and Acting in Language Models, Yao et al., ICLR 2023. [paper][code]
-
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, Yang et al., arxiv 2023. [paper][code][AutoAct]
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., NeurIPS 2023. [paper][code][Plug in and Play Implementation][tree-of-thought-prompting]
-
Graph of Thoughts: Solving Elaborate Problems with Large Language Models, Besta et al., arxiv 2023. [paper][code]
-
Cumulative Reasoning with Large Language Models, Zhang et al., arxiv 2023. [paper][code][On the Diagram of Thought]
-
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, Sel et al., arxiv 2023. [paper][unofficial code]
-
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation, Ding et al., arxiv 2023. [paper][code]
-
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models, Ye et al., arxiv 2024. [paper][code]
-
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, Zhou et al., ICLR 2023. [paper]
-
DEPS: Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents, Wang et al., arxiv 2023. [paper][code]
-
RAP: Reasoning with Language Model is Planning with World Model, Hao et al., EMNLP 2023. [paper][code][LLM Reasoners COLM 2024]
-
LEMA: Learning From Mistakes Makes LLM Better Reasoner, An et al., arxiv 2023. [paper][code]
-
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, Chen et al., TMLR 2023. [paper][code]
-
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Li et al., arxiv 2023. [paper][[code]]
-
The Impact of Reasoning Step Length on Large Language Models, Jin et al., arxiv 2024. [paper][code]
-
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, Wang et al., ACL 2023. [paper][code][maestro]
-
Improving Factuality and Reasoning in Language Models through Multiagent Debate, Du et al., ICML 2024. [paper][code][Multi-Agents-Debate]
-
Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al., NeurIPS 2024. [paper][code][MCT Self-Refine][SelFee]
-
Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al., NeurIPS 2023. [paper][code]
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, Gou et al., ICLR 2024. [paper][code]
-
LATS: Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models, Zhou et al., ICML 2024. [paper][code]
-
Self-Discover: Large Language Models Self-Compose Reasoning Structures, Zhou et al., NeurIPS 2024. [paper][unofficial implementation][SELF-DISCOVER]
-
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation, Wang et al., arxiv 2024. [paper][code]
-
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents, Zhu et al., arxiv 2024. [paper][code][KnowLM][KnowPAT]
-
Advancing LLM Reasoning Generalists with Preference Trees, Yuan et al., arxiv 2024. [paper][code]
-
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, Yang et al., arxiv 2024. [paper][code][SymbCoT]
-
ReST-EM: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models, Singh et al., arxiv 2023. [paper][unofficial code]
-
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent, Aksitov et al., arxiv 2023. [paper][[code]]
-
Searchformer: Beyond A: Better Planning with Transformers via Search Dynamics Bootstrapping*, Lehnert et al., COLM 2024. [paper][code][Dualformer]
-
How Far Are We from Intelligent Visual Deductive Reasoning?, Zhang et al., arxiv 2024. [paper][code]
-
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, Lee et al., arxiv 2024. [paper][code]
-
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning, Kim et al., arxiv 2024. [paper][code]
-
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning, Wang et al., arxiv 2024. [paper][code]
-
QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-Correction, Huang et al., ACL 2024. [paper][code]
-
Internal Consistency and Self-Feedback in Large Language Models: A Survey, Liang et al., arxiv 2024. [paper][code]
-
Prover-Verifier Games improve legibility of language model outputs, Kirchner et al., 2024. [blog][paper]
-
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning, Wang et al., ACL 2024. [paper][code]
-
ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search*, Zhang et al., NeurIPS 2024. [paper][code][llm-mcts]
-
GenRM: Generative Verifiers: Reward Modeling as Next-Token Prediction, Zhang et al., arxiv 2024. [paper][CriticGPT][Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning][Free Process Rewards without Process Labels]
-
rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers, Qi et al., arxiv 2024. [paper][code][Orca 2][STaR][Quiet-STaR]
-
OpenAI o1: Learning to Reason with LLMs, OpenAI, 2024. [blog][OpenAI o1 System Card][Agent Q][Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters][search-and-learn][Let's Verify Step by Step][Thinking LLMs: General Instruction Following with Thought Generation][Awesome-LLM-Strawberry]
-
O1 Replication Journey: A Strategic Progress Report -- Part 1, Qin et al., arxiv 2024. [paper][code][O1 Replication Journey -- Part 2][LLaMA-O1][Marco-o1][qwq-32b-preview]
-
ReFT: Reasoning with Reinforced Fine-Tuning, Luong et al., ACL 2024. [paper][code][VinePPO]
-
LLaVA-o1: Let Vision Language Models Reason Step-by-Step, Xu et al., arxiv 2024. [paper][code][internvl2.0_mpo][Insight-V][VisVM]
-
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems, Min et al., arxiv 2024. [paper][code][Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search]
-
[llm-reasoners][g1][Open-O1][show-me][OpenR]
-
Scaling Laws for Neural Language Models, Kaplan et al., arxiv 2020. [paper][unofficial code]
-
Emergent Abilities of Large Language Models, Wei et al., TMRL 2022. [paper]
-
Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., NeurIPS 2022. [paper]
-
Scaling Laws for Autoregressive Generative Modeling, Henighan et al., arxiv 2020. [paper]
-
Are Emergent Abilities of Large Language Models a Mirage, Schaeffer et al., NeurIPS 2023 Outstanding Paper. [paper]
-
Understanding Emergent Abilities of Language Models from the Loss Perspective, Du et al., arxiv 2024. [paper][Predicting Emergent Capabilities by Finetuning]
-
S2A: System 2 Attention (is something you might need too), Weston et al., arxiv 2023. [paper][Distilling System 2 into System 1][system-2-research]
-
Memory3: Language Modeling with Explicit Memory, Yang et al., arxiv 2024. [paper]
-
Scaling Laws for Downstream Task Performance of Large Language Models, Isik et al., arxiv 2024. [paper][Establishing Task Scaling Laws via Compute-Efficient Model Ladders]
-
Scalable Pre-training of Large Autoregressive Image Models, El-Nouby et al., arxiv 2024. [paper][code]
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method, Zhang et al., ICLR 2024. [paper]
-
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws, Allen-Zhu et al, arxiv 2024. [paper]
-
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process, Ye et al., arxiv 2024. [paper][project page]
-
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems, Ye et al., arxiv 2024. [paper]
-
Language Modeling Is Compression, Delétang et al., arxiv 2023. [paper]
-
Language Models Represent Space and Time, Gurnee and Tegmark, ICLR 2024. [paper][code][The Geometry of Concepts: Sparse Autoencoder Feature Structure]
-
The Platonic Representation Hypothesis, Huh et al., arxiv 2024. [paper][code]
-
Observational Scaling Laws and the Predictability of Language Model Performance, Ruan et al., arxiv 2024. [paper][code]
-
Scaling Laws for Precision, Kumar et al., arxiv 2024. [paper]
-
Language models can explain neurons in language models, OpenAI, 2023. [blog][code][transformer-debugger]
-
Scaling and evaluating sparse autoencoders, Gao et al., arxiv 2024. [OpenAI Blog][paper][code][sae-auto-interp][multimodal-sae][Language-Model-SAEs]
-
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, Anthropic, 2023. [blog]
-
Mapping the Mind of a Large Language Model, Anthropic, 2024. [blog]
-
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era, Wu et al., arxiv 2024. [paper][code]
-
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models, Tufanov et al., arxiv 2024. [paper][code]
-
Transformer Explainer: Interactive Learning of Text-Generative Models, Cho et al., arxiv 2024. [paper][code][demo]
-
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation, Singh et al., ICML 2024 Spotlight. [paper][code]
-
[Transformer Circuits Thread][colah's blog][Transformer Interpretability][Awesome-Interpretability-in-Large-Language-Models][TransformerLens][inseq]
-
ROME: Locating and Editing Factual Associations in GPT, Meng et al., NeurIPS 2022. [paper][code][FastEdit]
-
Editing Large Language Models: Problems, Methods, and Opportunities, Yao et al., EMNLP 2023. [paper][code][Knowledge Mechanisms in Large Language Models: A Survey and Perspective]
-
A Comprehensive Study of Knowledge Editing for Large Language Models, Zhang et al., arxiv 2024. [paper][code]
-
[awesome-llm-interpretability][Awesome-LLM-Interpretability]
- [Awesome-Chinese-LLM][awesome-LLMs-In-China][awesome-LLM-resourses]
- GLM: General Language Model Pretraining with Autoregressive Blank Infilling, Du et al., ACL 2022. [paper][code][UniLM]
- GLM-130B: An Open Bilingual Pre-trained Model, Zeng et al., ICLR 2023. [paper][code]
- CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models, Zhou et al., EMNLP 2024. [paper][code]
- ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools, Zeng et al., arxiv 2024. [paper][ChatGLM-6B][ChatGLM2-6B][ChatGLM3][GLM-4][modeling_chatglm.py][AgentTuning][AlignBench][GLM-Edge]
- Baichuan 2: Open Large-scale Language Models, Yang et al., arxiv 2023. [paper][code][BaichuanSEED][Baichuan Alignment Technical Report][KV Shifting Attention Enhances Language Modeling]
- Baichuan-Omni Technical Report, Li et al., arxiv 2024. [paper][code]
- Qwen Technical Report, Bai et al., arxiv 2023. [paper][code]
- Qwen2 Technical Report, Yang et al., arxiv 2024. [paper][code][Qwen-Agent][AutoIF][modeling_qwen2.py]
- Yi: Open Foundation Models by 01.AI, Young et al., arxiv 2024. [paper][code][Yi-1.5]
- InternLM2 Technical Report, Cai et al., arxiv 2024. [paper][code][HuixiangDou]
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, Bi et al., arxiv 2024. [paper][DeepSeek-LLM][DeepSeek-V2][DeepSeek-Coder]
- MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies, Hu et al., arxiv 2024. [paper][code][MiniCPM-V]
- TeleChat Technical Report, Wang et al., arxiv 2024. [paper][code][TeleChat2][Tele-FLM Technical Report][Tele-FLM][Tele-FLM-1T]
- Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca, Cui et al., arxiv 2023. [paper][code][Chinese-LLaMA-Alpaca-2][Chinese-LLaMA-Alpaca-3][baby-llama2-chinese]
- Rethinking Optimization and Architecture for Tiny Language Models, Tang et al., arxiv 2024. [paper][code]
- YuLan: An Open-source Large Language Model, Zhu et al., arxiv 2024. [paper][code][Yulan-GARDEN]
- Towards Effective and Efficient Continual Pre-training of Large Language Models, Chen et al., arxiv 2024. [paper][code]
- [MOSS][MOSS-RLHF]
- [Skywork][Skywork-MoE][Orion][BELLE][Yuan-2.0][Yuan2.0-M32][Fengshenbang-LM][Index-1.9B][Aquila2]
- [LlamaFamily/Llama-Chinese][LinkSoul-AI/Chinese-Llama-2-7b][llama3-Chinese-chat][phi3-Chinese][LLM-Chinese][Llama3-Chinese-Chat][llama3-chinese]
- [Firefly][GPT2-chitchat]
- Alpaca-CoT: An Empirical Study of Instruction-tuning Large Language Models in Chinese, Si et al., arxiv 2023. [paper][code]
- Safety Assessment of Chinese Large Language Models, Sun et al., arxiv 2023. [paper][code][PurpleLlama]
- CS231n: Deep Learning for Computer Vision [link]
- AlexNet: ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012. [paper]
- VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan et al., ICLR 2015. [paper]
- GoogLeNet: Going Deeper with Convolutions, Szegedy et al., CVPR 2015. [paper]
- ResNet: Deep Residual Learning for Image Recognition, He et al., CVPR 2016 Best Paper. [paper][code]
- DenseNet: Densely Connected Convolutional Networks, Huang et al., CVPR 2017 Oral. [paper][code]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Tan et al., ICML 2019. [paper][code][EfficientNet-PyTorch][noisystudent]
- BYOL: Bootstrap your own latent: A new approach to self-supervised Learning, Grill et al., arxiv 2020. [paper][code][byol-pytorch][simsiam]
- ConvNeXt: A ConvNet for the 2020s, Liu et al., CVPR 2022. [paper][code][ConvNeXt-V2]
-
MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, He et al., CVPR 2020. [paper][code]
-
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, Chen et al., PMLR 2020. [paper][code]
-
CoCa: Contrastive Captioners are Image-Text Foundation Models, Yu et al., arxiv 2024. [paper][CoCa-pytorch][multimodal]
-
DINOv2: Learning Robust Visual Features without Supervision, Oquab et al., arxiv 2023. [paper][code]
-
FeatUp: A Model-Agnostic Framework for Features at Any Resolution, Fu et al., ICLR 2024. [paper][code]
-
InfoNCE Loss: Representation Learning with Contrastive Predictive Coding, Oord et al., arxiv 2018. [paper][unofficial code]
-
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, Mildenhall et al., ECCV 2020. [paper][code][nerf-pytorch][NeRF-Factory][LERF][LangSplat]
-
GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior, Wang et al., CVPR 2021. [paper][code][Real-ESRGAN][DreamClear]
-
CodeFormer: Towards Robust Blind Face Restoration with Codebook Lookup Transformer, Zhou et al., NeurIPS 2022. [paper][code][APISR][EvTexture][video2x][PMRF]
-
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers, Li et al., ECCV 2022. [paper][code][occupancy_networks][VoxFormer][TPVFormer][GeMap]
-
UniAD: Planning-oriented Autonomous Driving, Hu et al., CVPR 2023 Best Paper. [paper][code]
-
MagicDrive: Street View Generation with Diverse 3D Geometry Control, Gao et al., ICLR 2024. [paper][code][MagicDrive3D][MagicDriveDiT][DiffusionDrive]
-
Nougat: Neural Optical Understanding for Academic Documents, Blecher et al., arxiv 2023. [paper][code][marker][MixTeX-Latex-OCR][kosmos-2.5][gptpdf][MegaParse][omniparse][llama_parse][PDF-Extract-Kit][docling][ViTLP][markitdown]
-
FaceChain: A Playground for Identity-Preserving Portrait Generation, Liu et al., arxiv 2023. [paper][code]
-
MGIE: Guiding Instruction-based Image Editing via Multimodal Large Language Models, Fu et al., ICLR 2024 Spotlight. [paper][code]
-
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding, Li et al., CVPR 2024. [paper][code][AnyDoor]
-
InstantID: Zero-shot Identity-Preserving Generation in Seconds, Wang et al., arxiv 2024. [paper][code][InstantStyle][ID-Animator][ConsistentID][PuLID][ComfyUI-InstantID][StableAnimator][MV-Adapter]
-
ReplaceAnything as you want: Ultra-high quality content replacement, [link][OutfitAnyone][IDM-VTON][IMAGDressing][CatVTON]
-
LayerDiffusion: Transparent Image Layer Diffusion using Latent Transparency, Zhang et al., arxiv 2024. [paper][code][sd-forge-layerdiffusion][LayerDiffuse_DiffusersCLI][IC-Light][Paints-UNDO]
-
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image, Wu et al., arxiv 2024. [paper][code][MeshAnything][MeshAnythingV2][InstantMesh][prolificdreamer][Metric3D][ReconX][DimensionX][LLaMa-Mesh]
-
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement, Boss et al., arxiv 2024. [paper][code][ViewCrafter][3DTopia-XL][TRELLIS][See3D]
-
Sapiens: Foundation for Human Vision Models, Khirodkar et al., ECCV 2024 Oral. [paper][code]
-
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model, Wei et al., arxiv 2024. [paper][code][PaddleOCR][EasyOCR][llm_aided_ocr][surya][Umi-OCR]
-
[deepfakes/faceswap][DeepFaceLab][DeepFaceLive][deepface][Deep-Live-Cam][DeepFakeDefenders][HivisionIDPhotos][insightface]
-
[IOPaint][SPADE][PowerPaint]
-
[MuseV][ToonCrafter]
-
ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy et al., ICLR 2021. [paper][code][vit-pytorch][efficientvit][EfficientFormer][ViT-Adapter]
-
ViT-Adapter: Vision Transformer Adapter for Dense Predictions, Chen et al., ICLR 2023 Spotlight. [paper][code]
-
Vision Transformers Need Registers, Darcet et al., ICLR 2024 Outstanding Paper. [paper]
-
DeiT: Training data-efficient image transformers & distillation through attention, Touvron et al., ICML 2021. [paper][code]
-
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision, Kim et al., ICML 2021. [paper][code]
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, Liu et al., ICCV 2021. [paper][code]
-
MAE: Masked Autoencoders Are Scalable Vision Learners, He et al., CVPR 2022. [paper][code][FLIP][LVMAE-pytorch]
-
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks, Xiao et al., CVPR 2024 Oral. [paper][model][Inference code][Florence-VL]
-
LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models, Bai et al., arxiv 2023. [paper][code]
-
GLEE: General Object Foundation Model for Images and Videos at Scale, Wu wt al., CVPR 2024 Highlight. [paper][code]
-
Tokenize Anything via Prompting, Pan et al., arxiv 2023. [paper][code]
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Zhu et al., ICML 2024. [paper][code][VMamba][mambaout][MLLA]
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone, Hatamizadeh and Kautz, arxiv 2024. [paper][code]
-
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, Yang et al., arxiv 2024. [paper][code][Depth-Anything-V2][PromptDA][ml-depth-pro][DepthCrafter][rollingdepth]
-
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models, Guo et al., arxiv 2024. [paper][code]
-
TiTok: An Image is Worth 32 Tokens for Reconstruction and Generation, Yu et al., NeurIPS 2024. [paper][code][titok-pytorch][Randomized Autoregressive Visual Generation][Cosmos-Tokenizer]
-
Theia: Distilling Diverse Vision Foundation Models for Robot Learning, Shang et al., arxiv 2024. [paper][code]
- GAN: Generative Adversarial Networks, Goodfellow et al., NeurIPS 2014. [paper][code][Pytorch-GAN]
- StyleGAN3: Alias-Free Generative Adversarial Networks, Karras etal., NeurIPS 2021. [paper][code][StyleFeatureEditor]
- GigaGAN: Scaling up GANs for Text-to-Image Synthesis, Kang et al., CVPR 2023 Highlight. [paper][project repo][gigagan-pytorch]
- [pytorch-CycleGAN-and-pix2pix][img2img-turbo]
- VAE: Auto-Encoding Variational Bayes, Kingma et al., arxiv 2013. [paper][code][Pytorch-VAE][VAE blog]
- VQ-VAE: Neural Discrete Representation Learning, Oord et al., NIPS 2017. [paper][code][vector-quantize-pytorch]
- VQ-VAE-2: Generating Diverse High-Fidelity Images with VQ-VAE-2, Razavi et al., arxiv 2019. [paper][code]
- VQGAN: Taming Transformers for High-Resolution Image Synthesis, Esser et al., CVPR 2021. [paper][code][FQGAN]
- VAR: Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, Tian et al., NeurIPS 2024 Best Paper. [paper][code][Infinity]
- LlamaGen: Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation, Sun et al., arxiv 2024. [paper][code][causalfusion][DnD-Transformer][chameleon][Emu3]
- MAR: Autoregressive Image Generation without Vector Quantization, Li et al., arxiv 2024. [paper][code][autoregressive-diffusion-pytorch]
- Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation, Luo et al., arxiv 2024. [paper][code][magvit2-pytorch]
- OmniGen: Unified Image Generation, Xiao et al., arxiv 2024. [paper][code]
- HART: Efficient Visual Generation with Hybrid Autoregressive Transformer, Tang et al., arxiv 2024. [paper][code]
- Autoregressive Models in Vision: A Survey, Xiong et al., arxiv 2024. [paper][code]
- IBQ: Taming Scalable Visual Tokenizer for Autoregressive Image Generation, Shi et al., arxiv 2024. [paper][code][Cosmos-Tokenizer][TokenFlow][Divot][VidTok]
-
InstructPix2Pix: Learning to Follow Image Editing Instructions, Brooks et al., CVPR 2023 Highlight. [paper][code]
-
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold, Pan et al., SIGGRAPH 2023. [paper][code]
-
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing, Shi et al., arxiv 2023. [paper][code]
-
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, Mou et al., ICLR 2024 Spolight. [paper][code]
-
DragAnything: Motion Control for Anything using Entity Representation, Wu et al., ECCV 2024. [paper][code][Framer][SG-I2V]
-
LEDITS++: Limitless Image Editing using Text-to-Image Models, Brack et al., arxiv 2023. [paper][code][demo]
-
Diffusion Model-Based Image Editing: A Survey, Huang et al., arxiv 2024. [paper][code]
-
PromptFix: You Prompt and We Fix the Photo, Yu et al., NeurIPS 2024. [paper][code]
-
MimicBrush: Zero-shot Image Editing with Reference Imitation, Chen et al., arxiv 2024. [paper][code][EchoMimic]
-
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models, Shuai et al., arxiv 2024. [paper][code]
-
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models, Atzmon et al., arxiv 2024. [paper]
-
MagicQuill: An Intelligent Interactive Image Editing System, Liu et al., arxiv 2024. [paper][code]
-
BrushEdit: All-In-One Image Inpainting and Editing, Li et al., arxiv 2024. [paper][code]
-
[EditAnything][ComfyUI-UltraEdit-ZHO][libcom][Awesome-Image-Composition][RF-Solver-Edit]
-
DETR: End-to-End Object Detection with Transformers, Carion et al., arxiv 2020. [paper][code][detrex]
-
Focus-DETR: Less is More_Focus Attention for Efficient DETR, Zheng et al., arxiv 2023. [paper][code]
-
U2-Net_Going Deeper with Nested U-Structure for Salient Object Detection, Qin et al., arxiv 2020. [paper][code]
-
YOLO: You Only Look Once: Unified, Real-Time Object Detection Redmon et al., arxiv 2015. [paper]
-
YOLOX: Exceeding YOLO Series in 2021, Ge et al., arxiv 2021. [paper][code]
-
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism, Wang et al., arxiv 2023. [paper][code]
-
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, Liu et al., ECCV 2024. [paper][code][DINO-X][OV-DINO][OmDet][groundingLMM]
-
YOLO-World: Real-Time Open-Vocabulary Object Detection, Cheng et al., CVPR 2024. [paper][code]
-
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, Wang et al., arxiv 2024. [paper][code]
-
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy, Jiang et al., arxiv 2024. [paper][code][ChatRex]
-
YOLOv10: Real-Time End-to-End Object Detection, Wang et al., arxiv 2024. [paper][code]
-
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement, Peng et al., arxiv 2024. [paper][code]
-
[detectron2][yolov5][mmdetection][mmdetection3d][detrex][ultralytics][AlphaPose]
-
U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et al., MICCAI 2015. [paper][Pytorch-UNet][xLSTM-UNet-Pytorch]
-
Segment Anything, Kirillov et al., ICCV 2023. [paper][code][SAM-Adapter-PyTorch][EditAnything][SegmentAnything3D]
-
SAM 2: Segment Anything in Images and Videos, Ravi et al., SIGGRAPH 2024. [blog][paper][code][SAM2Long]
-
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, Xiong et al., CVPR 2024. [paper][code][FastSAM][RobustSAM][MobileSAM]
-
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks, Ren et al., arxiv 2024. [paper][code][Grounded-SAM-2]
-
LISA: Reasoning Segmentation via Large Language Model, Lai et al., arxiv 2023. [paper][code][VideoLISA]
-
Track Anything: Segment Anything Meets Videos, Yang et al., arxiv 2023. [paper][code][Caption-Anything][Tracking-Anything-with-DEVA]
-
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory, Yang et al., arxiv 2024. [paper][code][EfficientTAM]
-
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding, Zhang et al., arxiv 2024. [paper][code]
-
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training, Tong et al., NeurIPS 2022 Spotlight. [paper][code]
-
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts, Zhao et al., arxiv 2024. [paper][code]
-
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation, Wang et al., arxiv 2024. [paper]
-
VideoMamba: State Space Model for Efficient Video Understanding, Li et al., ECCV 2024. [paper][code]
-
VideoChat: Chat-Centric Video Understanding, Li et al., CVPR 2024 Highlight. [paper][code]
-
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models, Maaz et al., ACL 2024. [paper][code][Video-LLaMA][MovieChat][Chat-UniVi]
-
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark, Li et al., CVPR 2024 Highlight. [paper][code][PhyGenBench]
-
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer, Zhang et al., EMNLP 2024. [paper][code]
-
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions, Ju et al., arxiv 2024. [paper][code]
-
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling, Men et al., arxiv 2024. [paper][code][MIMO-pytorch][StableV2V]
-
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding, Shu et al., arxiv 2024. [paper][code][LongVU][VisionZip]
- ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy, Vishniakov et al., arxiv 2023. [paper][code]
- Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey, Xin et al., arxiv 2024. [paper][code]
-
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, Radford et al., arxiv 2022. [paper][code][whisper.cpp][faster-whisper][WhisperFusion][whisper-diarization]
-
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio, Bain et al., arxiv 2023. [paper][code]
-
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling,Gandhi et al., arxiv 2023. [paper][code]
-
Speculative Decoding for 2x Faster Whisper Inference, Sanchit Gandhi, HuggingFace Blog 2023. [blog][paper]
-
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers, Wang et al., arxiv 2023. [paper][code]
-
VALL-E-X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling, Zhang et al., arxiv 2023. [paper][code]
-
Seamless: Multilingual Expressive and Streaming Speech Translation, Seamless Communication et al., arxiv 2023. [paper][code][audiocraft]
-
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation, Seamless Communication et al., arxiv 2023. [paper][code]
-
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, Li et al., NeurIPS 2023. [paper][code]
-
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit, Zhang et al., arxiv 2023. [paper][code][FoleyCrafter][vta-ldm]
-
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech, Kim et al., ICML 2021. [paper][code][Bert-VITS2][so-vits-svc-fork][GPT-SoVITS][VITS-fast-fine-tuning]
-
OpenVoice: Versatile Instant Voice Cloning, Qin et al., arxiv 2023. [paper][code][MockingBird][clone-voice][Real-Time-Voice-Cloning]
-
Spirit LM: Interleaved Spoken and Written Language Model, Nguyen et al., arxiv 2024. [paper][code]
-
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models, Ju et al., arxiv 2024. [paper][e2-tts-pytorch]
-
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild, Peng et al., arxiv 2024. [paper][code]
-
WavLLM: Towards Robust and Adaptive Speech Large Language Model, Hu et al., arxiv 2024. [paper][code]
-
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation, Xu et al., arxiv 2024. [paper][code][hallo2][champ][PersonaTalk][JoyVASA][memo][EDTalk]
-
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning, Zhang et al., ACL 2024. [paper][code][LLaMA-Omni][SpeechGPT]
-
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs, Tongyi Speech Team, arxiv 2024. [paper][code][CosyVoice][CosyVoice 2]
-
Qwen2-Audio Technical Report, Chu et al., arxiv 2024. [blog][paper][code][Qwen-Audio][GLM-4-Voice]
-
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling, Ji et al., arxiv 2024. [paper][code]
-
Language Model Can Listen While Speaking, Ma et al., arxiv 2024. [paper][demo]
-
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming, Xie et al., arxiv 2024. [paper][code][mini-omni2][moshi][LLaMA-Omni]
-
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching, Chen et al., arxiv 2024. [paper][code][FireRedTTS][Seed-TTS]
-
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis, Liao et al., arxiv 2024. [paper][code]
-
Github Repositories
-
[coqui-ai/TTS][suno-ai/bark][ChatTTS][WhisperSpeech][MeloTTS][parler-tts][fish-speech][MARS5-TTS][metavoice-src][OuteTTS]
-
[stable-audio-tools][Qwen-Audio][pyannote-audio][ims-toucan][AudioLCM][speech-to-speech][ichigo][TEN-Agent]
-
[FunASR][FunClip][FunAudioLLM][TeleSpeech-ASR][EmotiVoice][wenet]
-
[SadTalker][Wav2Lip][video-retalking][SadTalker-Video-Lip-Sync][AniPortrait][GeneFacePlusPlus][V-Express][MuseTalk][EchoMimic][echomimic_v2][MimicTalk][Real3DPortrait][MiniMates][Linly-Talker]
-
[speech-trident][AudioNotes][pyvideotrans][outspeed][VideoChat][MMAudio]
- ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation, Li et al., NeurIPS 2021. [paper][code]
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, Li et al., ICML 2022. [paper][code][laion-coco]
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, Li et al., ICML 2023. [paper][code]
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning, Dai et al., arxiv 2023. [paper][code]
- X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning, Panagopoulou et al., arxiv 2023. [paper][code]
- xGen-MM (BLIP-3): A Family of Open Large Multimodal Models, Xue et al., arxiv 2024. [paper][code]
- xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations, Qin et al., arxiv 2024. [paper][code]
- xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs, Ryoo et al., arxiv 2024. [paper]
- LAVIS: A Library for Language-Vision Intelligence, Li et al., arxiv 2022. [paper][code]
- VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts, Bao et al., NeurIPS 2022. [paper][code]
- BEiT: BERT Pre-Training of Image Transformers, Bao et al., ICLR 2022 Oral presentation. [paper][code]
- BeiT-V3: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, Wang et al., CVPR 2023. [paper][code]
- CLIP: Learning Transferable Visual Models From Natural Language Supervision, Radford et al., ICML 2021. [paper][code][open_clip][clip-as-service][SigLIP][EVA][DIVA][Clip-Forge]
- DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code]
- GLIPv2: Unifying Localization and Vision-Language Understanding, Zhang et al., NeurIPS 2022. [paper][code][GLIGEN]
- SigLIP: Sigmoid Loss for Language Image Pre-Training, Zhai et al, arxiv 2023. [paper][siglip]
- EVA-CLIP: Improved Training Techniques for CLIP at Scale, Sun et al., arxiv 2023. [paper][code][EVA-CLIP-18B]
- Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese, Yang et al., arxiv 2022. [paper][code]
- MetaCLIP: Demystifying CLIP Data, Xu et al., ICLR 2024 Spotlight. [paper][code]
- Alpha-CLIP: A CLIP Model Focusing on Wherever You Want, Sun et al., arxiv 2023. [paper][code][Bootstrap3D]
- MMVP: Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs, Tong et al., arxiv 2024. [paper][code]
- MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training, Vasu et al., CVPR 20224. [paper][code]
- Long-CLIP: Unlocking the Long-Text Capability of CLIP, Zhang et al., ECCV 2024. [paper][code][Inf-CLIP]
- CLOC: Contrastive Localized Language-Image Pre-Training, Chen et al., arxiv 2024. [paper]
- LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation, Huang et al., arxiv 2024. [paper][code]
- SuperClass: Classification Done Right for Vision-Language Pre-Training, Huang et al., NeurIPS 2024. [paper][code]
- AIM-v2: Multimodal Autoregressive Pre-training of Large Vision Encoders, Fini et al., arxiv 2024. [paper][code]
-
Tutorial on Diffusion Models for Imaging and Vision, Stanley H. Chan, arxiv 2024. [paper][diffusion-models-class]
-
Denoising Diffusion Probabilistic Models, Ho et al., NeurIPS 2020. [paper][code][Pytorch Implementation][RDDM]
-
Improved Denoising Diffusion Probabilistic Models, Nichol and Dhariwal, ICML 2021. [paper][code]
-
Diffusion Models Beat GANs on Image Synthesis, Dhariwal and Nichol, NeurIPS 2021. [paper][code]
-
Classifier-Free Diffusion Guidance, Ho and Salimans, NeurIPS 2021. [paper][code]
-
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, Nichol et al., arxiv 2021. [paper][code]
-
DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code][dalle-mini]
-
Stable-Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models, Rombach et al., CVPR 2022. [paper][code][CompVis/stable-diffusion][Stability-AI/stablediffusion][ml-stable-diffusion][cleandift]
-
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et al., arxiv 2023. [paper][code][SDXL-Lightning]
-
Introducing Stable Cascade, Stability AI, 2024. [link][code][model]
-
SDXL-Turbo: Adversarial Diffusion Distillation, Sauer et al., arxiv 2023. [paper][code]
-
LCM: Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference, Luo et al., arxiv 2023. [paper][code][Hyper-SD][DMD2][ddim]
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module, Luo et al., arxiv 2023. [paper][code][diffusion-forcing][InstaFlow]
-
Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, Esser et al., ICML 2024 Best Paper. [paper][model][mmdit]
-
SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation, Sauer et al., arxiv 2024. [paper][SD3.5]
-
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation, Kodaira et al., arxiv 2023. [paper][code]
-
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models, Marjit et al., arxiv 2024. [paper][code]
-
Video Diffusion Models, Ho et al., arxiv 2022. [paper][code]
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, Blattmann et al., arxiv 2023. [paper][code][Stable Video 4D][VideoCrafter][Video-Infinity]
-
Consistency Models, Song et al., arxiv 2023. [paper][code][Consistency Decoder]
-
sCM: Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models, Lu and Song, arxiv 2024. [paper][blog]
-
A Survey on Video Diffusion Models, Xing et al., srxiv 2023. [paper][code]
-
Diffusion Models: A Comprehensive Survey of Methods and Applications, Yang et al., arxiv 2023. [paper][code]
-
MAGVIT2: Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation, Yu et al., ICLR 2024. [paper][magvit2-pytorch][Open-MAGVIT2][LlamaGen]
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models, Avrahami et al., arxiv 2023. [paper][code]
-
U-ViT: All are Worth Words: A ViT Backbone for Diffusion Models, Bao et al., CVPR 2023. [paper][code]
-
UniDiffuser: One Transformer Fits All Distributions in Multi-Modal Diffusion, Bao et al., arxiv 2023. [paper][code]
-
Matryoshka Diffusion Models, Gu et al., arxiv 2023. [paper][code]
-
SEDD: Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution, Lou et al., ICML 2024 Best Paper. [paper][code]
-
l-DAE: Deconstructing Denoising Diffusion Models for Self-Supervised Learning, Chen et al., arxiv 2024. [paper]
-
DiT: Scalable Diffusion Models with Transformers, Peebles et al., ICCV 2023 Oral. [paper][code][OpenDiT][VideoSys][MDT][PipeFusion][fast-DiT][FastVideo][xDiT][rlt][U-DiT]
-
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers, Ma et al., arxiv 2024. [paper][code]
-
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis, Ren et al., NeurIPS 2024. [paper][model][AdaCache]
-
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer, Yang et al., arxiv 2024. [paper][code]
-
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion, Chen et al., arxiv 2024. [paper][code]
-
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget, Sehwag et al., arxiv 2024. [paper][code][tiny-stable-diffusion]
-
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model, Zhou et al. arxiv 2024. [paper][transfusion-pytorch][chameleon][MonoFormer]
-
REPA: Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think, Yu et al., arxiv 2024. [paper][code]
-
In-Context LoRA for Diffusion Transformers, Huang et al., arxiv 2024. [paper][code]
-
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models, Li et al., arxiv 2024. [paper][code]
-
Training-free Regional Prompting for Diffusion Transformers, Chen et al., arxiv 2024. [paper][code][Add-it][RAG-Diffusion]
-
Github Repositories
-
[stable-diffusion-webui][stable-diffusion-webui-colab][sd-webui-controlnet][stable-diffusion-webui-forge][automatic]
-
[ComfyUI][streamlit][gradio][ComfyUI-Workflows-ZHO][ComfyUI_Bxb]
-
LLaVA: Visual Instruction Tuning, Liu et al., NeurIPS 2023 Oral. [paper][code][ViP-LLaVA][LLaVA-pp][TinyLLaVA_Factory][LLaVA-RLHF][LLaVA-KD]
-
LLaVA-1.5: Improved Baselines with Visual Instruction Tuning, Liu et al., arxiv 2023. [paper][code][LLaVA-UHD][LLaVA-HR]
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models, Li et al., arxiv 2024. [paper][code][Open-LLaVA-NeXT][MG-LLaVA][LongVA][LongLLaVA]
-
LLaVA-OneVision: Easy Visual Task Transfer, Li et al., arxiv 2024. [paper][code]
-
LLaVA-Video: Video Instruction Tuning With Synthetic Data, Zhang et al., arxiv 2024. [paper][code][LLaVA-Critic][LLaVA-Video-178K]
-
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day, Li et al., arxiv 2023. [paper][code]
-
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection, Lin et al., EMNLP 2024. [paper][code][PLLaVA][ml-slowfast-llava]
-
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models, Lin et al., arxiv 2024. [paper][code]
-
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models, Zhu et al., arxiv 2023. [paper][code][MiniGPT-4-ZH]
-
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning, Chen et al., arxiv 2023. [paper][code]
-
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens, Ataallah et al., arxiv 2024. [paper][code]
-
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens, Zheng et al., arxiv 2023. [paper][code]
-
Flamingo: a Visual Language Model for Few-Shot Learning, Alayrac et al., NeurIPS 2022. [paper][open-flamingo][flamingo-pytorch]
-
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding, Zhang et al., EMNLP 2023. [paper][code][VideoLLaMA2][VideoLLM-online][LLaMA-VID]
-
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs, Zhao et al., arxiv 2023. [paper][code][AnyGPT]
-
Emu: Generative Pretraining in Multimodality, Sun et al., ICLR 2024. [paper][code]
-
Emu3: Next-Token Prediction is All You Need, Wang et al., arxiv 2024. [paper][code]
-
EVE: Unveiling Encoder-Free Vision-Language Models, Diao et al., NeurIPS 2024. [paper][code]
-
DreamLLM: Synergistic Multimodal Comprehension and Creation, Dong et al., ICLR 2024 Spotlight. [paper][code][dreambench_plus]
-
Meta-Transformer: A Unified Framework for Multimodal Learning, Zhang et al., arxiv 2023. [paper][code]
-
NExT-GPT: Any-to-Any Multimodal LLM, Wu et al., arxiv 2023. [paper][code]
-
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, Wu et al., arxiv 2023. [paper][code]
-
SoM: Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V, Yang et al., arxiv 2023. [paper][code]
-
Ferret: Refer and Ground Anything Anywhere at Any Granularity, You et al., arxiv 2023. [paper][code][Ferret-UI][Ferret-UI 2]
-
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities, Bachmann et al., arxiv 2024. [paper][code][MM1.5]
-
CogVLM: Visual Expert for Pretrained Language Models, Wang et al., arxiv 2023. [paper][code][VisualGLM-6B][CogCoM]
-
CogVLM2: Visual Language Models for Image and Video Understanding, Hong et al., arxiv 2024. [paper][code][glm-4v-9b]
-
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond, Bai et al., arxiv 2023. [paper][code]
-
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution, Wang et al., arxiv 2024. [paper][code][modeling_qwen2_vl.py][finetune-Qwen2-VL][Oryx][Video-XL][Video-ChatGPT]
-
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition, Zhang et al., arxiv 2023. [paper][code][InternLM-XComposer2.5-OmniLive]
-
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks, Chen et al., CVPR 2024 Oral. [paper][code][InternVideo][InternVid][InternVL1.5 paper][Mono-InternVL][InternVL2.5 paper]
-
DeepSeek-VL: Towards Real-World Vision-Language Understanding, Lu et al., arxiv 2024. [paper][code][DeepSeek-VL2]
-
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation, Wu et al., arxiv 2024. [paper][code][JanusFlow]
-
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions, Chen et al., arxiv 2023. [paper][code]
-
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions, Chen et al., arxiv 2024. [paper][code]
-
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones, Yuan et al., arxiv 2023. [paper][code]
-
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models, Li et al., CVPR 2024. [paper][code]
-
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models, Wei et al., arxiv 2023. [paper][code]
-
Vary-toy: Small Language Model Meets with Reinforced Vision Vocabulary, Wei et al., arxiv 2024. [paper][code]
-
VILA: On Pre-training for Visual Language Models, Lin et al., CVPR 2024. [paper][code][LongVILA][Eagle][NVLM][NVILA]
-
POINTS1.5: Building a Vision-Language Model towards Real World Applications, Liu et al., arxiv 2024. [paper][code][POINTS][Multi-Modal Generative Embedding Model][Number it: Temporal Grounding Videos like Flipping Manga]
-
LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code][Navigation World Models]
-
Chameleon: Mixed-Modal Early-Fusion Foundation Models, Chameleon Team, arxiv 2024. [paper][code][X-Prompt]
-
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts, Li et al., arxiv 2024. [paper][code]
-
RL4VLM: Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning, Zhai et al., arxiv 2024. [paper][code][RLHF-V][RLAIF-V]
-
OpenVLA: An Open-Source Vision-Language-Action Model, Kim et al., arxiv 2024. [paper][code][Emma-X]
-
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis, Fu et al., arxiv 2024. [paper][code][lmms-eval][VLMEvalKit][multimodal-needle-in-a-haystack][MM-NIAH][VideoNIAH][ChartMimic][WildVision][HourVideo]
-
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities, Yu et al., ICML 2024. [paper][code][UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling]
-
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs, Tong et al., arxiv 2024. [paper][code]
-
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models, Sun et al., ICML 2024. [paper][code]
-
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation, Chern et al., arxiv 2024. [paper][code]
-
PaliGemma: A versatile 3B VLM for transfer, Beyer et al., arxiv 2024. [paper][code][pytorch-paligemma][PaliGemma 2]
-
Pixtral 12B, Agrawal et al., arxiv 2024. [paper][webpage][Pixtral-12B-2409][Pixtral-Large-Instruct-2411]
-
MiniCPM-V: A GPT-4V Level MLLM on Your Phone, Yao et al., arxiv 2024. [paper][code][VisCPM][RLHF-V][RLAIF-V]
-
VITA: Towards Open-Source Interactive Omni Multimodal LLM, Fu et al., arxiv 2024. [paper][code][Freeze-Omni][Lyra]
-
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation, Xie et al., arxiv 2024. [paper][code][Transfusion][VILA-U][LWM]
-
MIO: A Foundation Model on Multimodal Tokens, Wang et al., arxiv 2024. [paper][Emu3]
-
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities, Xie and Wu, arxiv 2024. [paper][code]
-
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding, Shen et al., arxiv 2024. [paper][code][Video-XL][VisionZip][Apollo]
-
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models, Liang et al., arxiv 2024. [paper][Transfusion]
-
[MiniCPM-V][moondream][MobileVLM][OmniFusion][Bunny][MiCo][Vitron][mPLUG-Owl][mPLUG-DocOwl][Ovis][Aria][unicom][Infini-Megrez]
-
[datacomp][MMDU][MINT-1T][OpenVid-1M][SkyScript-100M][FineVideo]
-
DALL-E: Zero-Shot Text-to-Image Generation, Ramesh et al., arxiv 2021. [paper][code]
-
DALL-E3: Improving Image Generation with Better Captions, Betker et al., OpenAI 2023. [paper][code][blog][Glyph-ByT5]
-
ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models, Zhang et al., ICCV 2023 Marr Prize. [paper][code][ControlNet_Plus_Plus][ControlNeXt][ControlAR][OminiControl][ROICtrl]
-
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models, Mou et al., AAAI 2024. [paper][code]
-
AnyText: Multilingual Visual Text Generation And Editing, Tuo et al., arxiv 2023. [paper][code]
-
RPG: Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs, Yang et al., ICML 2024. [paper][code][IterComp]
-
LAION-5B: An open large-scale dataset for training next generation image-text models, Schuhmann et al., NeurIPS 2022. [paper][code][blog][laion-coco][]
-
DeepFloyd IF: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., arxiv 2022. [paper][code]
-
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., NeurIPS 2022. [paper][unofficial code]
-
Instruct-Imagen: Image Generation with Multi-modal Instruction, Hu et al., arxiv 2024. [paper][Imagen 3]
-
CogView: Mastering Text-to-Image Generation via Transformers, Ding et al., NeurIPS 2021. [paper][code][ImageReward]
-
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers, Ding et al., arxiv 2022. [paper][code]
-
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion, Zheng et al., ECCV 2024. [paper][code]
-
TextDiffuser: Diffusion Models as Text Painters, Chen et al., arxiv 2023. [paper][code]
-
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering, Chen et al., arxiv 2023. [paper][code]
-
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, Chen et al., arxiv 2023. [paper][code]
-
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models, Chen et al., arxiv 2024. [paper][code]
-
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation, Chen et al., arxiv 2024. [paper][code]
-
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models, Ye et al., arxiv 2023. [paper][code][ID-Animator][InstantID]
-
Controllable Generation with Text-to-Image Diffusion Models: A Survey, Cao et al., arxiv 2024. [paper][code]
-
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation, Zhou et al., NeurIPS 2024. [paper][code][AutoStudio]
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding, Li et al., arxiv 2024. [paper][code][Hunyuan3D-1][xDiT]
-
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation, Li et al., CVPR 2024. [paper][t2v_metrics][VQAScore]
-
[Kolors][Kolors-Virtual-Try-On][EVLM: An Efficient Vision-Language Model for Visual Understanding]
-
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models, Zhao et al., NeurIPS 2024. [paper][code]
-
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens, Fan et al., arxiv 2024. [paper]
-
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis, Bai et al., arxiv 2024. [paper][code]
-
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers, Xie et al., ICLR 2025. [paper][code]
-
[flux][x-flux][x-flux-comfyui][FLUX.1-dev-LoRA][qwen2vl-flux]
-
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Hu et al., arxiv 2023. [paper][code][Open-AnimateAnyone][Moore-AnimateAnyone][AnimateAnyone][UniAnimate][Animate-X]
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions, Tian et al., arxiv 2024. [paper][code][V-Express]
-
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation, Wei wt al., arxiv 2024. [paper][code]
-
DreaMoving: A Human Video Generation Framework based on Diffusion Models, Feng et al., arxiv 2023. [paper][code]
-
MagicAnimate:Temporally Consistent Human Image Animation using Diffusion Model, Xu et al., arxiv 2023. [paper][code][champ][MegActor]
-
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors, Xing et al., ECCV 2024. [paper][code]
-
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control, Guo et al., arxiv 2024. [paper][code][FasterLivePortrait][FollowYourEmoji]
-
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis, Liang et al., arxiv 2023. [paper][code]
-
Video Diffusion Models, Ho et al., arxiv 2022. [paper][video-diffusion-pytorch]
-
Make-A-Video: Text-to-Video Generation without Text-Video Data, Singer et al., arxiv 2022. [paper][make-a-video-pytorch]
-
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation, Wu et al., ICCV 2023. [paper][code]
-
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators, Khachatryan et al., ICCV 2023 Oral. [paper][code][StreamingT2V]
-
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers, Hong et al., ICLR 2023. [paper][code]
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer, Yang et al., arxiv 2024. [paper][code][cogvideox-factory]
-
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos, Ma et al., AAAI 2024. [paper][code][Follow-Your-Pose v2][Follow-Your-Emoji]
-
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts, Ma et al., arxiv 2024. [paper][code]
-
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, Guo et al., arxiv 2023. [paper][code][AnimateDiff-Lightning]
-
StableVideo: Text-driven Consistency-aware Diffusion Video Editing, Chai et al., ICCV 2023. [paper][code]
-
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models, Zhang et al., arxiv 2023. [paper][code]
-
TF-T2V: A Recipe for Scaling up Text-to-Video Generation with Text-free Videos, Wang et al., arxiv 2023. [paper][code]
-
Lumiere: A Space-Time Diffusion Model for Video Generation, Bar-Tal et al., arxiv 2024. [paper][lumiere-pytorch]
-
Sora: Creating video from text, OpenAI, 2024. [blog][Generative Models for Image and Long Video Synthesis][Generative Models of Images and Neural Networks][Open-Sora][VideoSys][Open-Sora-Plan][minisora][SoraWebui][MuseV][PhysDreamer][easyanimate]
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models, Liu et al., arxiv 2024. [paper][code]
-
How Far is Video Generation from World Model: A Physical Law Perspective, Kang et al., arxiv 2024. [paper][code]
-
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework, Yuan et al., arxiv 2024. [paper][code]
-
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, Dehghani et al., NeurIPS 2024. [paper][unofficial code]
-
VideoPoet: A Large Language Model for Zero-Shot Video Generation, Kondratyuk et al., ICML 2024 Best Paper. [paper]
-
Latte: Latent Diffusion Transformer for Video Generation, Ma et al., arxiv 2024. [paper][code][LaVIT][LaVie][VBench][Vchitect-2.0][LiteGen]
-
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis, Menapace et al., arxiv 2024. [paper][articulated-animation]
-
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance, Feng et al., arxiv 2024. [paper][code][Qihoo-T2X]
-
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos, Hu et al., arxiv 2024. [paper][code]
-
Loong: Generating Minute-level Long Videos with Autoregressive Language Models, Wang et al., arxiv 2024. [paper]
-
Movie Gen: A Cast of Media Foundation Models, The Movie Gen team @ Meta, 2024. [blog][paper][unofficial code]
-
Pyramidal Flow Matching for Efficient Video Generative Modeling, Jin et al., arxiv 2024. [paper][code][LaVIT]
-
Allegro: Open the Black Box of Commercial-Level Video Generation Model, Zhou et al., arxiv 2024. [paper][code]
-
Open-Sora Plan: Open-Source Large Video Generation Model, Lin et al., arxiv 2024. [paper][code][Open-Sora][ConsisID]
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models, Kong et al., arxiv 2024. [paper][code][FastVideo]
-
[MoneyPrinterTurbo][clapper][videos][manim][Mochi 1][genmoai-smol][LTX-Video][Kandinsky-4]
- A Survey on Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][Awesome-Multimodal-Large-Language-Models][MME-Survey]
- Multimodal Foundation Models: From Specialists to General-Purpose Assistants, Li et al., arxiv 2023. [paper][cvinw_readings]
- From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities, Lu et al., arxiv 2024. [paper][Leaderboards]
- Efficient Multimodal Large Language Models: A Survey, Jin et al., arxiv 2024. [paper][code]
- An Introduction to Vision-Language Modeling, Bordes et al., arxiv 2024. [paper]
- Building and better understanding vision-language models: insights and future directions, Laurençon et al., arxiv 2024. [paper]
- Video Understanding with Large Language Models: A Survey, Tang et al., arxiv 2023. [paper][code]
- Fuyu-8B: A Multimodal Architecture for AI Agents Bavishi et al., Adept blog 2023. [blog][model]
- Otter: A Multi-Modal Model with In-Context Instruction Tuning, Li et al., arxiv 2023. [paper][code]
- OtterHD: A High-Resolution Multi-modality Model, Li et al., arxiv 2023. [paper][code][model]
- CM3leon: Scaling Autoregressive Multi-Modal Models_Pretraining and Instruction Tuning, Yu et al., arxiv 2023. [paper][Unofficial Implementation]
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer, Tian et al., arxiv 2024. [paper][code]
- CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations, Qi et al., arxiv 2024. [paper][code]
- SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models, Gao et al., arxiv 2024. [paper][code]
- Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers, Gao et al., arxiv 2024. [paper][code]
- Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining, Liu et al., arxiv 2024. [paper][code]
- LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code]
- Chameleon: Mixed-Modal Early-Fusion Foundation Models, Chameleon Team, arxiv 2024. [paper][code][X-Prompt]
- *SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation, Ge et al., arxiv 2024. [paper][code][SEED][SEED-Story]
-
Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy, 2016. [blog][reinforcement-learning-an-introduction][easy-rl][deep-rl-course][wangshusen/DRL]
-
DQN: Playing Atari with Deep Reinforcement Learning, Mnih et al., arxiv 2013. [paper][code]
-
DQNNaturePaper: Human-level control through deep reinforcement learning, Mnih et al., Nature 2015. [paper][DQN-tensorflow][DQN_pytorch]
-
DDQN: Deep Reinforcement Learning with Double Q-learning, Hasselt et al., AAAI 2016. [paper][RL-Adventure][deep-q-learning][Deep-RL-Keras]
-
Rainbow: Combining Improvements in Deep Reinforcement Learning, Hesssel et al., AAAI 2018. [paper][Rainbow]
-
DDPG: Continuous control with deep reinforcement learning, Lillicrap et al., ICLR 2016. [paper][pytorch-ddpg]
-
PPO: Proximal Policy Optimization Algorithms, Schulman et al., arxiv 2017. [paper][code][trl ppo_trainer][PPO-PyTorch][implementation-matters][PPOxFamily][The 37 Implementation Details of PPO][ppo-implementation-details]
-
Diffusion Models for Reinforcement Learning: A Survey, Zhu et al., arxiv 2023. [paper][code][diffusion_policy]
-
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations, Matthias Lehmann, arxiv 2024. [paper][code]
-
[tianshou][rlkit][pytorch-a2c-ppo-acktr-gail][Safe-Reinforcement-Learning-Baselines][CleanRL][openrl][ElegantRL]
- Decision Transformer_Reinforcement Learning via Sequence Modeling, Chen et al., NeurIPS 2021. [paper][code]
- Trajectory Transformer: Offline Reinforcement Learning as One Big Sequence Modeling Problem, Janner et al., NeurIPS 2021. [paper][code]
- Guiding Pretraining in Reinforcement Learning with Large Language Models, Du et al., ICML 2023. [paper][code]
- Introspective Tips: Large Language Model for In-Context Decision Making, Chen et al., arxiv 2023. [paper]
- Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, Chebotar et al., CoRL 2023. [paper][Unofficial Implementation]
- Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods, Cao et al., arxiv 2024. [paper]
-
A Gentle Introduction to Graph Neural Networks, Sanchez-Lengeling et al., Distill 2021. [paper]
-
CS224W: Machine Learning with Graphs, Stanford. [link]
-
GCN: Semi-Supervised Classification with Graph Convolutional Networks, Kipf and Welling, ICLR 2017. [paper][code][pygcn]
-
GAE: Variational Graph Auto-Encoders, Kipf and Welling, arxiv 2016. [paper][code][gae-pytorch]
-
GAT: Graph Attention Networks, Veličković et al., ICLR 2018. [paper][code][pyGAT][pytorch-GAT]
-
GIN: How Powerful are Graph Neural Networks?, Xu et al., ICLR 2019. [paper][code]
-
Graphormer: Do Transformers Really Perform Bad for Graph Representation, Ying et al., NeurIPS 2021. [paper][code]
-
GraphGPT: Graph Instruction Tuning for Large Language Models, Tang et al., SIGIR 2024. [paper][code][Graph-Bert]
-
OpenGraph: Towards Open Graph Foundation Models, Xia et al., arxiv 2024. [paper][code][AnyGraph][openspg]
-
A Survey of Large Language Models for Graphs, Ren et al., KDD 2024. [paper][code]
- Attention is All you Need, Vaswani et al., NIPS 2017. [paper][code][transformer-debugger][The Illustrated Transformer][The Random Transformer][The Annotated Transformer][Transformers-Tutorials][x-transformers][flash-linear-attention][matmulfreellm]
- RoPE: RoFormer: Enhanced Transformer with Rotary Position Embedding, Su et al., arxiv 2021. [paper][code][rotary-embedding-torch][rerope][blog][positional_embedding][longformer]
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints, Ainslie et al., arxiv 2023. [paper][unofficial code][blog]
- RWKV: Reinventing RNNs for the Transformer Era, Peng et al., EMNLP 2023. [paper][code][ChatRWKV][rwkv.cpp]
- Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence, Peng et al., arxiv 2024. [paper][code][Awesome-RWKV-in-Vision]
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces, Gu and Dao, COLM 2024. [paper][code][Transformers are SSMs][mamba-minimal][Awesome-Mamba-Papers][Falcon Mamba]
- Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models, De et al., arxiv 2024. [paper][recurrentgemma]
- Jamba: A Hybrid Transformer-Mamba Language Model, Lieber et al., arxiv 2024. [paper][model][Samba]
- Neural Network Diffusion, Wang et al., arxiv 2024. [paper][code][GPD][tree-diffusion]
- KAN: Kolmogorov-Arnold Networks, Liu et al., arxiv 2024. [paper][code][KAN 2.0][efficient-kan][kan-gpt][Convolutional-KANs][kat][FAN]
- xLSTM: Extended Long Short-Term Memory, Beck et al., arxiv 2024. [paper][code][vision-lstm][PyxLSTM][xlstm-cuda][Attention as an RNN][ttt-lm-pytorch][Were RNNs All We Needed]
- TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters, Wang et al., arxiv 2024. [paper][code]
- Byte Latent Transformer: Patches Scale Better Than Tokens, Pagnoni et al., arxiv 2024. [paper][code][lingua]