TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Description

Vision-Language Models (VLMs) have shown impressive performance in vision tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt tuning techniques, including textual, visual, and multimodal prompting, offer efficient alternatives by leveraging learnable prompts. However, their application to Vision-Language Segmentation Models (VLSMs) and evaluation under significant domain shifts remain unexplored. This work presents an open-source benchmarking framework, TuneVLSeg, to integrate various unimodal and multimodal prompt tuning techniques into VLSMs, making prompt tuning usable for downstream segmentation datasets with any number of classes. TuneVLSeg includes $6$ prompt tuning strategies on various prompt depths used in 2 VLSMs totaling 8 different combinations. We test various prompt tuning on 8 diverse medical datasets, including 3 radiology datasets (breast tumor, echocardiograph, chest X-ray pathologies), 5 non-radiology datasets (polyp, ulcer, skin cancer), and two natural domain segmentation datasets. Our study found that textual prompt tuning struggles under significant domain shifts, from natural-domain images to medical data. Furthermore, visual prompt tuning, with fewer hyperparameters than multimodal prompt tuning, often achieves performance competitive to multimodal approaches, making it a valuable first attempt. Our work advances the understanding and applicability of different prompt-tuning techniques for robust domain-specific segmentation.

Installation

Pip

# clone project
git clone https://github.com/YourGithubName/your-repo-name
cd your-repo-name

# [OPTIONAL] create conda environment
conda create -n myenv python=3.9
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Conda

# clone project
git clone https://github.com/YourGithubName/your-repo-name
cd your-repo-name

# create conda environment and install dependencies
conda env create -f environment.yaml -n myenv

# activate conda environment
conda activate myenv

How to run

Train model with default configuration

# train on CPU
python src/train.py trainer=cpu

# train on GPU
python src/train.py trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

python src/train.py experiment=experiment_name.yaml

You can override any parameter from command line like this

python src/train.py trainer.max_epochs=20 data.batch_size=64

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
configs		configs
data		data
denseclip_configs		denseclip_configs
detectron2		detectron2
logs		logs
notebooks		notebooks
pretrain		pretrain
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Description

Installation

Pip

Conda

How to run

About

Releases

Packages

Languages

naamiinepal/tunevlseg

Folders and files

Latest commit

History

Repository files navigation

TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Description

Installation

Pip

Conda

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages