About me

Welcome to my homepage! I am an associate professor at Chinese University of Hong Kong, Shenzhen. My major interests are optimization and deep learning, and most recently, large foundational models.

Previously I was a tenure-track assistant professor at UIUC, and was a full-time visiting scientist at FAIR (Facebook AI Research, now Meta AI Fundamental AI Research). I was a postdoc at Stanford, obtained PhD from Univ. of Minnesota, and BS in math from Peking University.

My recent interests include theory and algorthms of large foundation models, generative AI, learning-assisted optimization, neural-net compression, landscape of neural-nets and Adam. I’m especially interested in nonconvex optimization: (a) I have written a survey “optimization for deep learning: an overview”; (b) I provided one of the first geometrical analysis for non-convex matrix completion. Besides, I have been working on: (1) large-scale optimization algorithms, especially Adam, ADMM and coordinate descent. (2) communication networks.

【Recruiting】 We (as part of CUHK-SZ, SRIBD, SICIAM) have multiple positions for research scientists, research engineer, postdoc, PhD students, visiting scholars, visiting students, undergrad interns. If you are interested in LLM, AI, deep learning, optimization and etc., please feel free to contact me at sunruoyu@cuhk.edu.cn. If possible, attach your CV. *********


Associate Professor (tenured), School of Data Science, Chinese University of Hong Kong, Shenzhen.
Senior Research Scientist, Shenzhen Institute of Big Data (SRIBD).
Vice Chair, Shenzhen International Center for Industrial and Applied Mathematics (SICIAM).
Adjunct Associate Professor, UIUC.

Past Professional Expeirence

Assistant Professor, 2017-2022.
Department of Industrial and Enterprise Systems Engineering
Coordinated Science Lab (affiliated)
Department of Electrical and Computer Engineering (affiliated)
University of Illinois at Urbana-Champaign

Visiting Researcher (full-time),
Facebook Artificial Intelligence Research, 2016.06-2016.12.

Post-doctoral Scholar, Dept. of Management Science and Engineering, Stanford University (host: Yinyu Ye), 2015-2016.


Ph.D. Electrical Engineering, University of Minnesota, 2009-2015.
B.Sc. in Mathematics, Peking University, Beijing, China, 2005-2009.

Research Interests

  • Large language models: theory, fine-tuning, compression, optimization algorithm, domain-specific application
  • Optimization for deep learning: lanscape analysis of neural-nets, GANs, Adam, adversarial robustness, etc.
  • Non-convex optimization for machine learning: neural networks, matrix factorization, etc.
  • Large-scale optimization: ADMM, coordinate descent, adaptive gradient methods, etc.
  • Other research interests: Information theory and wireless communications, such as interference alignment and base station association.

Selected works


  • May 2023: Our papers on learning to optimization, foresight pruning have appeared in ICLR 2023.

  • Oct 2022: Our papers on Adam, GAN, implicit bias and adversarial generalization are accepted to NeurIPS 2022. Three of them are selected as spotlight papers (~5% of >10,000 submissions). Congratulations to all!

    In particular, in the paper on Adam, named “Adam Can Converge Without Any Modification on Update Rules”, we proved that the original Adam with proper hyperparameters can converge! (Yes, it is unknown before, except a tiny special case of Adam which is close to RMSProp is proved in our prevous paper on RMSProp).

  • Feb 2022: Our paper “Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning”, with Haoxiang Wang, Yite Wang, and Bo Li, is accepted to CVPR 2022.
  • Jan 2022: Our paper On the Benefit of Width for Neural Networks: Disappearance of Basins (with Dawei, Tian) has been accepted to SIAM Journal on Optimization.
  • Nov 2021: Our paper Understanding Limitation of Two Symmeterized Orders by Worst-case Complexity (with Peijun, Zhisheng) has been Accepted to SIAM Journal on Optimization.
  • Oct 2021: Our paper Spurious Local Minima Exist for Almost All Over-parameterized Neural Networks (with Tian, Dawei) has been accepted to Mathematics of Operations Research.
  • Mar 2021: Our paper Revisiting Landscape Analysis for Neural-networks: Eliminating Decreasing Paths to Infinity (with Shiyu, Srikant) has been accepted to SIAM Journal on Optimization.
  • Feb 2021: Our paper RMSprop can converge with proper hyper-parameter (Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun) has been accepted to ICLR 2021 as Spotlight.
  • Sep 2020: Our paper Towards a better global loss landscape of GANs (joint with Tiantian Fang, Alex Schwing) is accepted to NeurIPS 2020 as oral paper (1.1% of 9454 submissions).
  • Sep 2020: Our paper https://arxiv.org/abs/2010.15768 (joint with Jiawei Zhang, Peijun Xiao, Zhi-Quan Luo) is accpeted to NeurIPS 2020.
  • Jun 2020: Our survey “[On the global landscape of neural networks: an overview]” (joint with Srikant, Shiyu Liang, Dawei Li, Tian Ding) has appeared in IEEE SPM (signal processing magazine).
  • Dec 2019: my survey “optimization for deep learning: theory and algorithms” is available at arxiv https://arxiv.org/abs/1912.08957 Comments are welcome.
  • Oct 2019: our paper “spurious local minima exist for almost all over-parameterized neural networks” is available at optimizaiton online.
  • Oct 2019: our paper “understanding two symmeterized orders by worst-case complexity” is available at arxiv.
  • Sep 2019: our paper “Worst-case complexity of cyclic coordinate descent: O(n^2) gap with randomized versions” is accepted by MP (Mathematical Programming).
  • Aug 2019: I organized a session on “Non-convex optimization for neural networks” at ICCOPT 2019, the triennial conference of contiuous optimization.
  • Mar 2019: our paper Max-Sliced Wasserstein Distance and its use for GANs is accepted by CVPR 2019 as Oral. 
  • Jan 2019: our paper on Adam-type methods (joint with Xiangyi Chen, Sijia Liu and Mingyi Hong) is accepted by ICLR 2019. 
  • Dec 2018: our paper “On the efficiency of random permutation for ADMM and coordinate descent” is accepted by MOR (Mathematics of Operations Research).
  • Nov 2018: I gave a talk at Stanford ISL Colloquium (see slides here) on Nov 8, and Google Brain on Nov 9. 
  • Nov 2018: I organized a session on non-convex optimization for machine learning at INFORMS annual meeting
  • Oct 2018: our paper that says adding a neuron can eliminate all bad local minima will appear at NIPS 2018

Professional Services

Area chair for ICLR, NeurIPS, AISTATS, ICML.

Reviewer for

  • Machine learning and computer science:  ICLR, NeurIPS, ICML, COLT, FOCS, AISTATS, JMLR, Neural computation
  • Optimization area: Mathematical Programming, SIAM Journal on Optimization, SIAM Journal on Computing, Pacific Journal of Optimization.
  • Signal processing and information theory: IEEE Transaction on Information Theory, IEEE Transaction on Signal Processing, SPAWC, ICASSP