Teng Xiao

I am a postdoctoral researcher (and Young Investigator) at the Allen Institute for AI (AI2) and the University of Washington, advised by Prof. Noah A. Smith and Prof. Hanna Hajishirzi. I completed my PhD at Pennsylvania State University, where I am advised by Prof. Vasant Honavar. I am interested in machine learning, reinforcement learning, language models. I am currently working on: (i) AI Alignment with Human Feedback. (ii) Reasoning and Planning for Autonomous Decision Making.

Email  /  Scholar  /  Twitter  /  Github

Selected Publications

Full list on Google Scholar. * indicates co-first authors

Olmo3
OLMo Team, Allyson Ettinger*, Amanda Bertsch*, Bailey Kuehl*, David Graham*, David Heineman*, Dirk Groeneveld*, Faeze Brahman*, Finbarr Timbers*, Hamish Ivison*, Jacob Morrison*, Jake Poznanski*, Kyle Lo*, Luca Soldaini*, Matt Jordan*, Mayee Chen*, Michael Noukhovitch*, Nathan Lambert*, Pete Walsh*, Pradeep Dasigi*, Robert Berry*, Saumya Malik*, Saurabh Shah*, Scott Geng*, Shane Arora*, Shashank Gupta*, Taira Anderson*, Teng Xiao*, Tyler Murray*, Tyler Romero*, Victoria Graf*, Akari Asai, Akshita Bhagia, Alex Wettig, Alisa Liu, Aman Rangapur, Chloe Anastasiades, Costa Huang, Dustin Schwenk, Harsh Trivedi, Ian Magnusson, Jaron Lochner, Jiacheng Liu, Lj Miranda, Maarten Sap, Malia Morgan, Michael Schmitz, Michal Guerquin, Michael Wilson, Regan Huff, Ronan Le Bras, Rui Xin, Rulin Shao, Sam Skjonsberg, Shannon Zejiang Shen, Shuyue Stella Li, Tucker Wilde, Valentina Pyatkin, Will Merrill, Yapei Chang, Yuling Gu, Zhiyuan Zeng, Ashish Sabharwal, Luke Zettlemoyer, Pang Wei Koh , Ali Farhadi, Noah A. Smith*, Hannaneh Hajishirzi*
Preprint 2025. [Code]

Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?
Zhengyu Chen, Jinluan Yang, Teng Xiao, Ruochen Zhou, Luan Zhang, Xiangyu Xi, Xiaowei Shi, Wei Wang, Jinggang Wang
Preprint 2025. [Code]

Internalizing World Models via Self-Play Finetuning for Agentic RL
Shiqi Chen*, Tongyao Zhu*, Zian Wang*, Jinghan Zhang*, Kangrui Wang, Siyang Gao, Teng Xiao*, Yee Whye Teh, Junxian He, Manling Li
Preprint 2025. [Code]

Inference-time Alignment in Continuous Space
Yige Yuan, Teng Xiao*, Li Yunfan, Bingbing Xu, Shuchang Tao, Yunqi Qiu, Huawei Shen, Xueqi Cheng
NeurIPS, 2025. [Code]

Simple Distillation for One-Step Diffusion Models
Huaisheng Zhu, Teng Xiao, Shijie Zhou, Zhimeng Guo, Hangfan Zhang, Siyuan Xu, Vasant G Honavar
NeurIPS, 2025. [Code]

On a Connection Between Imitation Learning and RLHF
Teng Xiao, Yige Yuan, Mingxiao Li, Zhengyu Chen, Vasant G Honavar
ICLR, 2025. [Code]

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan*, Zhengyu Chen, Mingxiao Li, Shangsong Liang, Zhaochun Ren, Vasant G Honavar
ICLR, 2025. [Code]

DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant G Honavar
ICLR, 2025. [Code]

InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao, Zhen Ge, Sujay Sanghavi, Tian Wang, Julian Katz-Samuels, Marc Versage, Qingjun Cui, Trishul Chilimbi
NAACL, 2025. [Code]

How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective
Teng Xiao, Mingxiao Li, Yige Yuan, Huaisheng Zhu, Chao Cui, Vasant G Honavar.
EMNLP, 2024. [Code]

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment
Teng Xiao, Yige Yuan, Huaisheng Zhu, Mingxiao Li, Vasant G Honavar.
NeurIPS, 2024. [Code]

Efficient Contrastive Learning for Fast and Accurate Inference on Graphs
Teng Xiao, Huaisheng Zhu, Zhiwei Zhang, Zhimeng Guo, Charu C. Aggarwal, Suhang Wang, Vasant G Honavar.
ICML, 2024. [Code]

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He.
ICML, 2024. [Code]

Simple and Asymmetric Graph Contrastive Learning without Augmentations
Teng Xiao*, Huaisheng Zhu*, Zhengyu Chen, Suhang Wang.
NeurIPS, 2023. [Code]

Certifiably Robust Graph Contrastive Learning
Minhua Lin, Teng Xiao, Enyan Dai, Suhang Wang.
NeurIPS, 2023. [Code]

Decoupled Self-supervised Learning for Graphs
Teng Xiao, Zhengyu Chen, Zhimeng Guo, Zeyang Zhuang, Suhang Wang.
NeurIPS, 2022. [Code]

Academic Services


Program Committee Member & Reviewer: NeurIPS (2022, 2023, 2024), ICML (2023, 2024, 2025), ICLR (2022, 2024, 2025), AAAI (2022, 2023), WSDM (2023, 2024, 2025), ACL ARR (2024), SIGIR (2021, 2022, 2023), RecSys (2023), CIKM (2023), TheWebConf (2022, 2023), LoG (2024), COLM (2024)

Journal Reviewer: ACM Transactions on Intelligent Systems and Technology, ACM Transactions on Information Systems


Jon Barron makes this nice template