Publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. AoS
    Residual permutation test for regression coefficient testing
    Kaiyue Wen, Tengyao Wang, and Yuhao Wang
    2025
  2. arXiv
    Fantastic Pretraining Optimizers and Where to Find Them
    Kaiyue Wen, David Hall, Tengyu Ma, and Percy Liang
    2025
  3. ACL
    Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
    Zihan Qiu, Zeyu Huang, Bo Zheng, Kaiyue Wen, Zekun Wang, Rui Men, Ivan Titov, Dayiheng Liu, Jingren Zhou, and Junyang Lin
    2025
  4. ICLR
    From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
    Kaiyue Wen, Huaqing Zhang, Hongzhou Lin, and Jingzhao Zhang
    2025
  5. NeurIPS
    Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
    Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, and Junyang Lin
    2025
  6. COLM
    Weight Ensembling Improves Reasoning in Language Models
    Xingyu Dang, Christina Baek, Kaiyue Wen, Zico Kolter, and Aditi Raghunathan
    2025
  7. NeurIPS
    PaTH Attention: Position Encoding via Accumulating Householder Transformations
    Songlin Yang, Yikang Shen, Kaiyue Wen, Shawn Tan, Mayank Mishra, Liliang Ren, Rameswar Panda, and Yoon Kim
    2025
  8. ICML
    Task Generalization With AutoRegressive Compositional Structure: Can Learning From D Tasks Generalize to D^T Tasks?
    Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen, Hongzhou Lin, Jingzhao Zhang, and Mikhail Belkin
    2025

2024

  1. ICLR
    Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective
    Kaiyue Wen, Zhiyuan Li, Jason Wang, David Hall, Percy Liang, and Tengyu Ma
    2024
  2. ICLR
    RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
    Kaiyue Wen, Xingyu Dang, and Kaifeng Lyu
    2024

2023

  1. ICLR
    Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models
    Kaiyue Wen, Jiaye Teng, and Jingzhao Zhang
    2023
  2. ICLR
    How Does Sharpness-Aware Minimization Minimize Sharpness?
    Kaiyue Wen, Tengyu Ma, and Zhiyuan Li
    2023
  3. arXiv
    Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
    Haozhe Jiang, Kaiyue Wen, and Yilei Chen
    2023
  4. NeurIPS
    Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
    Kaiyue Wen, Zhiyuan Li, and Tengyu Ma
    2023
  5. NeurIPS
    (Un)interpretability of Transformers: a case study with Dyck grammars
    Kaiyue Wen, Yuchen Li, Bingbin Liu, and Andrej Risteski
    2023

2022

  1. NAACL
    On Transferability of Prompt Tuning for Natural Language Processing
    Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, Zhiyuan Liu, Peng Li, Juanzi Li, Lei Hou, Maosong Sun, and Jie Zhou
    In NAACL, 2022
  2. EMNLP
    Finding Skill Neurons in Pre-trained Transformers via Prompt Tuning
    Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, and Juanzi Li
    In EMNLP, 2022