News | Kaiyue Wen

Sep 01, 2025	New preprint (Fantastic Pretraining Optimizers and Where to Find Them) on arxiv!
May 01, 2025	WSD-S is used in training the best open-source 8B model Marin 8B.
Jan 20, 2025	3 papers (River Valley Landscape, RNNs are not Transformers (Yet), Optimization Analysis on Chain-of-Thought) accepted at ICLR 2025!
Jan 20, 2025	New preprint (Global Load Balancing Helps Expert Specialization) on arxiv!
Dec 01, 2024	Residual Permutation Test is accepted at AoS!
Oct 01, 2024	New preprints River Valley Landscape and Optimization Analysis on Chain-of-Thought on arxiv!
Sep 01, 2024	Start my Ph.D. study at Stanford University! I am currently rotating with Percy Liang.
Jul 01, 2024	Graduated from Tsinghua University with a Bachelor’s degree in Computer Science.
May 01, 2024	Receive and accept the offer from Stanford University! I am honored to receive the Stanford Graduate Fellowship.
Feb 01, 2024	New preprint RNNs are not Transformers (Yet) on arxiv!
Oct 01, 2023	Awarded the National Scholarship (top 0.2%)!
Sep 20, 2023	2 papers ([Sharpness&Generalization](https://arxiv.org/abs/2307.11007), (Un)interpretability of Transformers) accepted at Neurips 2023! Sharpness&Generalization is received as oral.
Sep 01, 2023	Receive the silver medal for Yao Award (Top 4 in Yao’s pilot class)!
Aug 01, 2023	Return to China for my senior year in Tsinghua.
Jul 01, 2023	Visit Hawaii for ICML 2023! Always great to see old friends.
Jun 20, 2023	Residual Permutation Test receive Major Revision from AoS.
Jun 01, 2023	Visiting Tengyu Ma at Stanford!
May 01, 2023	Visit Rwanda for ICLR 2023!
Mar 20, 2023	Reviewing ICML for the first time!
Mar 01, 2023	New preprint Solving LPN with Neural Networks on arxiv!
Feb 01, 2023	Visiting Andrej Risteski at CMU!
Jan 20, 2023	2 papers (Understanding SAM, Not Benign Overfitting) accepted at ICLR 2023!
Dec 20, 2022	New preprint Residual Permutation Test on arxiv!
Dec 01, 2022	New preprint Understanding SAM on arxiv!
Oct 01, 2022	1 paper (Skill Neurons) accepted at EMNLP 2022.
Jun 01, 2022	New preprint Not Benign Overfitting on arxiv!