Xihuai Wang's Page

Shanghai, China

I received my Ph.D. from the APEX Lab at Shanghai Jiao Tong University in 2026, under the supervision of Prof. Weinan Zhang and Prof. Ying Wen, affiliated with both the SJTU-Apex Group and the SJTU-MARL Group. I was honored to be selected into the Wen-Tsun Wu AI Honorary Doctoral Program in 2020. Prior to this, I received my B.Eng. in Computer Science and Technology from the School of Computer Science and Engineering, Sun Yat-sen University in 2020.

My research lies at the intersection of Reinforcement Learning and Multi-Agent Learning. My current research focuses on:

Large Language Model Reasoning and Agency
- Reinforcement learning methodologies for enhancing reasoning and agentic capabilities of LLMs
- Human-AI collaborative decision-making with LLM-based agents
Multi-Agent Reinforcement Learning
- Sample-efficient algorithms for cooperative MARL
- Zero-shot coordination and generalization in multi-agent systems

News

Jan 13, 2026	I have successfully defended my Ph.D. thesis and graduated from Shanghai Jiao Tong University! 🎉
Dec 2, 2025	A blog post sharing my perspective on KL estimators in reinforcement learning. English Version \| 中文版本 \| 知乎 \| 青稞 AI 公众号
Nov 23, 2025	A blog post sharing my perspective on training–inference mismatch in reinforcement learning for large language models. English Version \| 中文版本 \| 知乎 \| 青稞 AI 公众号
May 16, 2025	Our paper Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration has been accepted to ACL 2025! 机器之心公众号
Sep 26, 2024	Our work about zero-shot coordination evaluation ZSC-Eval is accepted by NeurIPS 2024 Dataset and Benchmark Track!

Selected Papers

Full publications are available on Google Scholar or Publications (* denotes equal contribution).
Works are organized with respect to topics, including:

Large Language Model Reasoning and Agency
Multi-agent Reinforcement Learning

Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration

Shao Zhang*, Xihuai Wang*, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, Weinan Zhang, Xinbing Wang, and Ying Wen

63rd ACL, 2025

LLM Reasoning and Agency

Bib PDF Code

@article{zhang2025dpt,
  author = {Zhang*, Shao and Wang*, Xihuai and Zhang, Wenhao and Li, Chaoran and Song, Junru and Li, Tingyu and Qiu, Lin and Cao, Xuezhi and Cai, Xunliang and Yao, Wen and Zhang, Weinan and Wang, Xinbing and Wen, Ying},
  title = {Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration},
  journal = {63rd ACL},
  year = {2025},
}

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination

Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, and Weinan Zhang

38th NeurIPS, 2024

MARL Generalization

Bib PDF Code

@article{wang2023zsceval,
  author = {Wang, Xihuai and Zhang, Shao and Zhang, Wenhao and Dong, Wentao and Chen, Jingxiao and Wen, Ying and Zhang, Weinan},
  title = {ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination},
  journal = {38th NeurIPS},
  year = {2024},
}

Order Matters: Agent-by-agent Policy Optimization

Xihuai Wang, Zheng Tian, Ziyu Wan, Ying Wen, Jun Wang, and Weinan Zhang

11th ICLR, 2023

MARL Efficiency

Bib PDF Code

@article{wang2023order,
  author = {Wang, Xihuai and Tian, Zheng and Wan, Ziyu and Wen, Ying and Wang, Jun and Zhang, Weinan},
  title = {Order Matters: Agent-by-agent Policy Optimization},
  journal = {11th ICLR},
  year = {2023},
  url = {https://openreview.net/forum?id=Q-neeWNVv1},
}

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Weinan Zhang, Xihuai Wang, Jian Shen, and Ming Zhou

30th IJCAI, 2021

MARL Efficiency

Bib PDF Code

@article{zhang2021modelbased,
  author = {Zhang, Weinan and Wang, Xihuai and Shen, Jian and Zhou, Ming},
  title = {Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts},
  journal = {30th IJCAI},
  year = {2021},
}