🌮 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

🌐 Website | 📑 Arxiv | 🤗 Datasets

If you like our project or are interested in its updates, please star us on GitHub :) Thank you! ⭐

News

🔥 [2024-12-10]: Data released!

How do we generate CoTA data?

We generate Chains-of-Thought-and–Action (CoTA) data automatically with two approaches as shown below: model-based generation (top) and programmatic generation (bottom).

Figure 1. CoTA data generation method

In model-based generation, we take existing image and QA pairs as inputs and prompt a large language model (i.e. GPT-4o) to generate either a chain-of-thought-and-action (CoTA) or chain-of-thought (CoT) without actions to answer the questions. Then, we verify that the chains lead to correct final answers and parse successfully; if not, we convert them into the direct answer (Direct) format with groundtruth answers. In programmatic generation, we first annotate images with human labelers and models, and then use the dense annotations to fill in manually written templates and generate QA and the corresponding CoTA with Python programs.

Code usage

Installation

You can easily download the repo and set up the environment via:

git clone https://github.com/airesearch-emu/cota.git
cd cota

pip install -r requirements.txt

Model-based CoTA generation

Step 1: Modify the environment and code paths in the script scripts/generate_mm_trajs.sh
Step 2: Run the script generate_mm_trajs.sh $subset, where $subset is a string representing a subset of the Cauldron dataset, or mantis-$subset for Mantis-Instruct. For example, for Cauldron: generate_mm_trajs.sh ai2d; for Mantis: generate_mm_trajs.sh mantis-contrastive_caption.

Programmatic CoTA generation

Generate CoTA for single-image examples: python cota/gen_tool_single.py
For multi-image examples: python cota/gen_tool_multi.py

License

The CoTA datasets are licensed under the noncommerical license CC-BY-NC 4.0. Users need to make their own assessment regarding any obligations or responsibilities under the corresponding licenses or terms and conditions pertaining to the original datasets and data. This release is for research purposes only in support of an academic paper.

Citation

Please cite us if you find our repository helpful. Thank you!

@misc{ma2024tacolearningmultimodalaction,
      title={TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action}, 
      author={Zixian Ma and Jianguo Zhang and Zhiwei Liu and Jieyu Zhang and Juntao Tan and Manli Shu and Juan Carlos Niebles and Shelby Heinecke and Huan Wang and Caiming Xiong and Ranjay Krishna and Silvio Savarese},
      year={2024},
      eprint={2412.05479},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.05479}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
autogen		autogen
cota		cota
pcota		pcota
sample_data		sample_data
scripts		scripts
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
data_gen.png		data_gen.png
gen_tool_multi.py		gen_tool_multi.py
gen_tool_single.py		gen_tool_single.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌮 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

🌐 Website | 📑 Arxiv | 🤗 Datasets

If you like our project or are interested in its updates, please star us on GitHub :) Thank you! ⭐

News

How do we generate CoTA data?

Code usage

Installation

Model-based CoTA generation

Programmatic CoTA generation

License

Citation

About

Releases

Packages

Contributors 3

Languages

License

SalesforceAIResearch/CoTA

Folders and files

Latest commit

History

Repository files navigation

🌮 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

🌐 Website | 📑 Arxiv | 🤗 Datasets

If you like our project or are interested in its updates, please star us on GitHub :) Thank you! ⭐

News

How do we generate CoTA data?

Code usage

Installation

Model-based CoTA generation

Programmatic CoTA generation

License

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages