Skip to content

Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views. (AAAI 2025)

License

Notifications You must be signed in to change notification settings

bytedance/ParGo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ParGo: Bridging Vision-Language with Partial and Global Views

Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views. (AAAI 2025)

Paper, Model ParGo

Setup

cd ParGo
conda create -n ParGo_env python=3.10 -y
conda activate ParGo_env
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r ./requirements.txt

Download Models

The LLM(internlm2-7b) and vision_encoder(eva-clip-l-14-336) need to be downloaded in advance.

Evaluation

MME Benchmark

Data

You can place the benchmark data in the benchmark directory. Data structure:

├── benchmarks
│   ├── MMEBenmark
│       └── images
│       └── Data_json

Json file in Data_json contains the image_name, question, answer, e.g.,

10002.jpg	Does this artwork exist in the form of painting? 	Yes

Evaluation

Step 1: Generate the response:

python3 eval/eval_mme_finetuning.py --config ./configs/MMEBench_interLM2-7B.json

Step 2: Calculate the score:

python3 eval/calculation_mme.py --results_dir ./output/internlm2-MME

For other benchmarks, please follow their official instructions to construct the files; the overall pipeline is the same as evaluating in the MME benchmark.

Acknowledgement

This project is developed based on MiniGPT and BLIP2. Very sincere thanks to the contributors to these excellent codebases.

If you find our code helpful to your research, please consider citing us with this BibTeX:

@misc{wang2024pargobridgingvisionlanguagepartial,
      title={ParGo: Bridging Vision-Language with Partial and Global Views}, 
      author={An-Lan Wang and Bin Shan and Wei Shi and Kun-Yu Lin and Xiang Fei and Guozhi Tang and Lei Liao and Jingqun Tang and Can Huang and Wei-Shi Zheng},
      year={2024},
      eprint={2408.12928},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.12928}, 
}

License

The source code and pretrained weights are licensed under BSD-3-Clause

About

Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views. (AAAI 2025)

Topics

Resources

License

Stars

Watchers

Forks

Languages