VLMs zero-to-hero

coming: january 2025...

hello

Welcome to VLMs Zero to Hero! This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.

roadmap

natural language processing (NLP) fundamentals

Word2Veq: Efficient Estimation of Word Representations in Vector Space (2013) and Distributed Representations of Words and Phrases and their Compositionality (2013)
Seq2Seq: Sequence to Sequence Learning with Neural Networks (2014)
Attention Is All You Need (2017)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
GPT: Improving Language Understanding by Generative Pre-Training (2018)

computer vision (CV) fundamentals

AlexNet: ImageNet Classification with Deep Convolutional Neural Networks (2012)
VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition (2014)
ResNet: Deep Residual Learning for Image Recognition (2015)

early vision-language models

scale and efficiency

Scaling Laws for Neural Language Models (2020)
LoRA: Low-Rank Adaptation of Large Language Models (2021)
QLoRA: Efficient Fine-tuning of Quantized LLMs (2023)

modern vision-language models

Flamingo: A Visual Language Model for Few-Shot Learning (2022)
LLaVA: Visual Instruction Tuning (2023)
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (2023)
PaliGemma: A versatile 3B VLM for transfer (2024)

contribute and suggest more papers

Are there important papers, models, or techniques we missed? Do you have a favorite breakthrough in vision-language research that isn't listed here? We’d love to hear your suggestions!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
01_natural_language_processing_fundamentals		01_natural_language_processing_fundamentals
02_computer_vision_fundamentals		02_computer_vision_fundamentals
03_early_vision_language_models		03_early_vision_language_models
04_scale_and_efficiency		04_scale_and_efficiency
05_modern_vision_language_models		05_modern_vision_language_models
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLMs zero-to-hero

hello

roadmap

natural language processing (NLP) fundamentals

computer vision (CV) fundamentals

early vision-language models

scale and efficiency

modern vision-language models

contribute and suggest more papers

About

License

SkalskiP/vlms-zero-to-hero

Folders and files

Latest commit

History

Repository files navigation

VLMs zero-to-hero

hello

roadmap

natural language processing (NLP) fundamentals

computer vision (CV) fundamentals

early vision-language models

scale and efficiency

modern vision-language models

contribute and suggest more papers

About

Topics

Resources

License

Stars

Watchers

Forks