This repo contains the official implementation of our paper:
Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning (NeurIPS 2023)
Xiaoqian Wu, Yong-Lu Li*, Jianhua Sun, Cewu Lu*
[project page] [paper] [arxiv]
Given an activity, the proposed symbolic system prompts a LLM to generate broad-coverage symbols and rational rules. It is implemented in generate_rule.py.
With generated symbols and rules, we can use it to reason out activities in images. We detail the experiments on HICO, with zero-shot CLIP as baseline.
First, download the DATA folder from this link, with generated rules and symbol predictions. Then run hico_clip+reason.ipynb to get the result.
Alternatively, you can generate rules and predict symbols yourselves :)
- To generate rules, run generate_rule.py. Note that the rules may differ because the evolution of GPT API and the sampling uncertainty.
- To predict symbols, please refer to hico_predict_symbols.py, where BLIP2 is used.
If you find this work useful, please cite via:
@inproceedings{wu2023symbol,
title={Symbol-LLM: Leverage Language Models for Symbolic System in Visual
Human Activity Reasoning},
author={Wu, Xiaoqian and Li, Yong-Lu and Sun, Jianhua and Lu, Cewu},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023}
}