Kaiyue Wen
Ph.D. Student at Stanford University
Stanford University
Stanford, CA
Hello! I am Kaiyue Wen. I am a second-year Ph.D. student at Stanford University, where I am grateful to be advised by Tengyu Ma and Percy Liang. I graduated from Tsinghua University, where I was a member of Yao’s pilot class. During my undergraduate study, I am fortunate to be advised by Tengyu Ma, Zhiyuan Liu, Andrej Risteski, Jingzhao Zhang, Yuhao Wang and Zhiyuan Li.
My research interest spreads broadly in deep learning. My long-term goal is to understand the physics behind deep learning and I believe a combination of theoretical analysis and empirical study is essential for this goal.
Recently, I’ve become fascinated by two fundamental axes of scaling in deep learning.
-
Demystifying pretraining: Pretraining has been the driving force behind the evolution of large language models, yet many foundational algorithmic choices remain poorly understood. Key aspects such as optimizers, architectures, and hyperparameter scaling strategies still lack consensus. My goal is to clarify these choices through rigorous benchmarking (e.g., benchmarking modern optimizers) and theoretical analysis (e.g., exploring the representation limitation of RNNs, architectures beyond $\mathrm{TC}^0$, and river-valley loss landscape). Most of my research in this direction is carried out in the open-source project Marin.
-
New algorithmic paradigms in reasoning: With the recent progress in reasoning reinforcement learning (RL), particularly innovations like long-chain-of-thought RL, there is growing potential to push the limits of model reasoning. While I am new to this field, my aim is to design end-to-end trainable multi-agent RL systems that build upon and extend the capabilities of current long-CoT RL paradigms.
News
| Sep 01, 2025 | New preprint (Fantastic Pretraining Optimizers and Where to Find Them) on arxiv! |
|---|---|
| May 01, 2025 | WSD-S is used in training the best open-source 8B model Marin 8B. |
| Jan 20, 2025 | 3 papers (River Valley Landscape, RNNs are not Transformers (Yet), Optimization Analysis on Chain-of-Thought) accepted at ICLR 2025! |
| Jan 20, 2025 | New preprint (Global Load Balancing Helps Expert Specialization) on arxiv! |
| Dec 01, 2024 | Residual Permutation Test is accepted at AoS! |
Selected Publications
- EMNLP
- ICLR
- arXiv
- ICLRUnderstanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective2024