ParGo: Bridging Vision-Language with Partial and Global Views

Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views. (AAAI 2025)

Setup

cd ParGo
conda create -n ParGo_env python=3.10 -y
conda activate ParGo_env
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r ./requirements.txt

Download Models

The LLM(internlm2-7b) and vision_encoder(eva-clip-l-14-336) need to be downloaded in advance.

Evaluation

MME Benchmark

Data

You can place the benchmark data in the benchmark directory. Data structure:

├── benchmarks
│   ├── MMEBenmark
│       └── images
│       └── Data_json

Json file in Data_json contains the image_name, question, answer, e.g.,

10002.jpg	Does this artwork exist in the form of painting? 	Yes

Evaluation

Step 1: Generate the response:

python3 eval/eval_mme_finetuning.py --config ./configs/MMEBench_interLM2-7B.json

Step 2: Calculate the score:

python3 eval/calculation_mme.py --results_dir ./output/internlm2-MME

For other benchmarks, please follow their official instructions to construct the files; the overall pipeline is the same as evaluating in the MME benchmark.

Acknowledgement

This project is developed based on MiniGPT and BLIP2. Very sincere thanks to the contributors to these excellent codebases.

If you find our code helpful to your research, please consider citing us with this BibTeX:

@misc{wang2024pargobridgingvisionlanguagepartial,
      title={ParGo: Bridging Vision-Language with Partial and Global Views}, 
      author={An-Lan Wang and Bin Shan and Wei Shi and Kun-Yu Lin and Xiang Fei and Guozhi Tang and Lei Liao and Jingqun Tang and Can Huang and Wei-Shi Zheng},
      year={2024},
      eprint={2408.12928},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.12928}, 
}

License

The source code and pretrained weights are licensed under BSD-3-Clause

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs		configs
eval		eval
pargo		pargo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParGo: Bridging Vision-Language with Partial and Global Views

Setup

Download Models

Evaluation

MME Benchmark

Data

Evaluation

Acknowledgement

License

About

Languages

License

bytedance/ParGo

Folders and files

Latest commit

History

Repository files navigation

ParGo: Bridging Vision-Language with Partial and Global Views

Setup

Download Models

Evaluation

MME Benchmark

Data

Evaluation

Acknowledgement

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages