I'm Chunhui Zhang, a Ph.D. student in Computer Science at Dartmouth 🌲, working with 🌟Professor Soroush Vosoughi. I also hold an MSCS degree (research-based) from Brandeis University, where I was honored with the GSAS Fellowship, and a Bachelor's degree in CS from Northeastern University, receiving the Outstanding Honor Thesis Award.
My research focuses on advancing the intrinsic properties of deep learning across diverse modalities, with an emphasis on trustworthiness, scalability, and applicability to real-world challenges. Highlights of my work include:
-
Scaling Multimodal Theory-of-Mind with Weak-to-Strong Bayesian Reasoning
Preprint | Code
Authors: Chunhui Zhang, Sean Dae Houlihan, Kwonjoon Lee, Nakul Agarwal, Zhongyu Ouyang, Soroush Vosoughi, Shao-Yuan Lo -
Pretrained Image-Text Models are Secretly Video Captioners
Preprint | Code
Authors: Chunhui Zhang, Yiren Jian, Zhongyu Ouyang, Soroush Vosoughi -
Working Memory Refines Essential Temporal Multimodal Sequences for Audio-Video-Language Modelling
Preprint | Code
Authors: Chunhui Zhang*, Xingjian Diao*, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui -
Working Memory Identifies Reasoning Limits in Language Models
Conference: EMNLP 2024
Authors: Chunhui Zhang, Yiren Jian, Zhongyu Ouyang, Soroush Vosoughi -
Learning Musical Representations for Music Performance Question Answering
Conference: Findings of EMNLP 2024
Authors: Xingjian Diao, Chunhui Zhang, Tingxuan Wu, Ming Cheng, Zhongyu Ouyang, Weiyi Wu, Soroush Vosoughi, Jiang Gui
Research Intern (Jun. 2024 – Sept. 2024)
- Project: Multimodal LLM Post-Training
- Developed a LLM-powered reasoner capable of understanding human behaviors in multimodal environments, achieving a 4.6% improvement over state-of-the-art solutions.
- The paper is under review, and the code has been released.
- Host: Dr. Shao-Yuan Lo
I am currently exploring Multimodal LLMs (Language-Vision-Audio), memory mechanisms, and reinforcement learning to push the boundaries of AGI. My recent work includes training recipes for large-scale models, which ranked Top-2 on PaperWithCode’s Video Captioning Leaderboard, showcasing optimal strategies for resource allocation in post-training.
- Email: [email protected]
- LinkedIn: Chunhui Zhang
- GitHub: chunhuizng
- Google Scholar: My Publications
Feel free to reach out if you're interested in collaboration, career advice, or just a friendly chat about research and life!