Xihuai Wang's Page

Shanghai, China

I am now a Researcher at the WeLM Team, WeChat AI, focusing on post-training of large language models. I received my Ph.D. from the APEX Lab at Shanghai Jiao Tong University in 2026, under the supervision of Prof. Weinan Zhang and Prof. Ying Wen, affiliated with both the SJTU-Apex Group and the SJTU-MARL Group. I was honored to be selected into the Wen-Tsun Wu AI Honorary Doctoral Program in 2020. Prior to this, I received my B.Eng. in Computer Science and Technology from the School of Computer Science and Engineering, Sun Yat-sen University in 2020.

My research lies at the intersection of Reinforcement Learning and Multi-Agent Learning. My current research focuses on:

Large Language Model Reasoning and Agency
- Reinforcement learning methodologies for enhancing reasoning and agentic capabilities of LLMs
- Human-AI collaborative decision-making with LLM-based agents
Multi-Agent Reinforcement Learning
- Sample-efficient algorithms for cooperative MARL
- Zero-shot coordination and generalization in multi-agent systems

News

all news →

Jan 13, 2026	Milestone I have successfully defended my Ph.D. thesis and graduated from Shanghai Jiao Tong University! 🎉
Dec 2, 2025	Writing A blog post sharing my perspective on KL estimators in reinforcement learning. English Version \| 中文版本 \| 知乎 \| 青稞 AI 公众号
Nov 23, 2025	Writing A blog post sharing my perspective on training–inference mismatch in reinforcement learning for large language models. English Version \| 中文版本 \| 知乎 \| 青稞 AI 公众号
May 16, 2025	Paper Our paper Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration has been accepted to ACL 2025! 机器之心公众号
Sep 26, 2024	Paper Our work about zero-shot coordination evaluation ZSC-Eval is accepted by NeurIPS 2024 Dataset and Benchmark Track!
Aug 8, 2023	Talk Give a talk about cooperative multi-agent reinforcement learning (Coordinate Agents vis Policy Optimization) at RLChina BiliBili 视频
Mar 25, 2023	Paper Our work about policy optimization in cooperative multi-agent scenarios Order Matters: Agent-by-agent Policy Optimization is accepted by ICLR 2023!

Selected Papers

all publications →

Full publications are available on Google Scholar or Publications (* denotes equal contribution). Works are organized by topic: LLM Reasoning and Agency, Multi-agent Reinforcement Learning.

LLM Reasoning & Agency63rd ACL, 2025
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration

Shao Zhang*, Xihuai Wang*, Wenhao Zhang, Chaoran Li, Junru Song, Tingyu Li, Lin Qiu, Xuezhi Cao, Xunliang Cai, Wen Yao, Weinan Zhang, Xinbing Wang, and Ying Wen

arXiv PDF Code
MARL Generalization38th NeurIPS, 2024
ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination

Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, and Weinan Zhang

arXiv PDF Code
MARL Efficiency11th ICLR, 2023
Order Matters: Agent-by-agent Policy Optimization

Xihuai Wang, Zheng Tian, Ziyu Wan, Ying Wen, Jun Wang, and Weinan Zhang

OpenReview PDF Code
MARL Efficiency30th IJCAI, 2021
Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Weinan Zhang, Xihuai Wang, Jian Shen, and Ming Zhou

PDF Code

News

Selected Papers

Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination

Order Matters: Agent-by-agent Policy Optimization

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts