Chunhui Zhang chunhuizng

Hi there 👋

I'm Chunhui Zhang, a Ph.D. student in Computer Science at Dartmouth 🌲, working with 🌟Professor Soroush Vosoughi. I also hold an MSCS degree (research-based) from Brandeis University, where I was honored with the GSAS Fellowship, and a Bachelor's degree in CS from Northeastern University, receiving the Outstanding Honor Thesis Award.

🔭 Research Focus and Key Contributions

My research focuses on advancing the intrinsic properties of deep learning across diverse modalities, with an emphasis on trustworthiness, scalability, and applicability to real-world challenges. Highlights of my work include:

Overcoming Multi-step Complexity in Theory-of-Mind Reasoning: A Scalable Bayesian Planner
Conference: NAACL 2025
Authors: Chunhui Zhang, Sean Dae Houlihan, Kwonjoon Lee, Nakul Agarwal, Zhongyu Ouyang, Soroush Vosoughi, Shao-Yuan Lo
Pretrained Image-Text Models are Secretly Video Captioners
Conference: NAACL 2025
Authors: {Chunhui Zhang*, Yiren Jian*}, Zhongyu Ouyang, Soroush Vosoughi
Temporal Working Memory: Query-Guided Temporal Segment Refinement for Enhanced Multimodal Understanding
Conference: Findings of NAACL 2025
Authors: {Chunhui Zhang*, Xingjian Diao*}, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui
Working Memory Identifies Reasoning Limits in Language Models
Conference: EMNLP 2024
Authors: Chunhui Zhang, Yiren Jian, Zhongyu Ouyang, Soroush Vosoughi
Learning Musical Representations for Music Performance Question Answering
Conference: Findings of EMNLP 2024
Authors: Xingjian Diao, Chunhui Zhang, Tingxuan Wu, Ming Cheng, Zhongyu Ouyang, Weiyi Wu, Soroush Vosoughi, Jiang Gui

💼 Internship Experience

Honda Research Institute USA

Research Intern (Jun. 2024 – Sept. 2024)

Project: Multimodal LLM Post-Training
Developed a LLM-powered reasoner capable of understanding human behaviors in multimodal environments, achieving a 4.6% improvement over state-of-the-art solutions.
The paper is under review, and the code has been released.
Host: Dr. Shao-Yuan Lo

🌱 Current Focus

I am currently exploring Multimodal LLMs (Language-Vision-Audio), memory mechanisms, and reinforcement learning to push the boundaries of AGI. My recent work includes training recipes for large-scale models, which ranked Top-2 on PaperWithCode’s Video Captioning Leaderboard, showcasing optimal strategies for resource allocation in post-training.

📫 Reach Me

Email: [email protected]
LinkedIn: Chunhui Zhang
GitHub: chunhuizng
Google Scholar: My Publications

💬 Let's Connect

Feel free to reach out if you're interested in collaboration, career advice, or just a friendly chat about research and life!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly