Papers
Recent works on large language models:
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models. Li, Z., Xu, T., Zhang, Y., Yu, Y., Sun, R., & Luo, Z. Q. Accepted to ICML 2024.
AceGPT, Localizing Large Language Models in Arabic Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu. NAACL 2024.
Selected Works
Adam Can Converge Without Any Modification on Update Rules, Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo, NeurIPS 2022 spotlight.
RMSprop converges with proper hyper-parameter Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun. ICLR 2021 (spotlight).
Towards a better global loss landscape of GANs. [arxiv] [link] [slides] Ruoyu Sun, Tiantian Fang, Alex Schwing. NeurIPS 2020 (oral).
Survey: The Global landscape of neural networks: An overview. Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R Srikant. IEEE Signal Processing Magzine, 2020. Compared to another survey which covers all aspects of optimization, this paper focuses on landscape and presents formal results.
Survey: Optimization for deep learning: theory and algorithms. Ruoyu Sun. JORSC (Journal of operations research society of china) 2020.
Revisiting Landscape Analysis for Neural-networks: Eliminating Decreasing Paths to Infinity, Shiyu Liang, Ruoyu Sun, Srikant. SIAM Journal on Optimization 2021.
Worst-case Complexity of Cyclic Coordinate Descent: O(n^2) Gap with Randomized Version, [arxiv], Ruoyu Sun, Yinyu Ye. Mathematical Programming (Series A), 2019.
Guaranteed Matrix Completion via Nonconvex Factorization,[arxiv], [slides] [short summary] Ruoyu Sun, Zhi-Quan Luo.
IEEE Transaction on Information Theory 2016; a shorter version has appeared at FOCS 2015. Honorable mention, 2015 INFORMS Optimization Society student paper prize.(prize page)
TALK SLIDES
Towards Better Global landscape of GANs: How 2 Lines of Code Change Makes Difference [slides]
Over-parameterized networks have no bad basins [slides]
When Do Neural Networks Have No Bad Local Minima? (for ICML and NeurIPS’19 papers) [slides]
PREPRINTS
Some earlier works:
- DEED: A General Quantization Scheme for Communication Efficiency in Bits, Tian Ye, Peijun Xiao, Ruoyu Sun.
- Achieving Small Test Error in Mildly Overparameterized Neural Networks S Liang, R Sun, R Srikant.
PUBLICATIONS (by Time)
$^+$: co-first author; $^{*}$: corresponding author
Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization, Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo. Accepted to COLT 2024.
PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming. Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun$^{*}$. Accepted to ICML 2024.
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models. Li, Z., Xu, T., Zhang, Y., Yu, Y., Sun, R., & Luo, Z. Q. Accepted to ICML 2024.
How Graph Neural Networks Learn: Lessons from Training Dynamics, Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan, Accepted to ICML 2024.
AceGPT, Localizing Large Language Models in Arabic Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu. NAACL 2024.
LEMON: Lossless model expansion Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang. ICLR 2024.
Balanced Training for Sparse GANs Yite Wang, Jing Wu, Naira Hovakimyan, Ruoyu Sun. NeurIPS 2023.
PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo. NeurIPS 2023.
A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming [arxiv] Qingyu Han $^+$, Linxin Yang $^+$, Qian Chen, Xiang Zhou, Dong Zhang, Akang Wang $^{*}$, Ruoyu Sun $^{*}$, Xiaodong Luo. ICLR 2023.
NTK-SAP: Improving neural network pruning by aligning training dynamics Yite Wang, Dawei Li, Ruoyu Sun. ICLR 2023
Adam Can Converge Without Any Modification on Update Rules, Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun $^*$, Zhi-Quan Luo, NeurIPS 2022 spotlight (~5% of 10k submissions).
Stability Analysis and Generalization Bounds of Adversarial Trainingarxiv Jiancong Xiao, Yanbo Fan, Ruoyu Sun $^{*}$, Jue Wang $^{*}$, Zhi-Quan Luo, NeurIPS 2022 spotlight (~5% of 10k submissions).
DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data, [arxiv] Tiantian Fang, Ruoyu Sun, Alex Schwing, NeurIPS 2022.
Does Momentum Change the Implicit Regularization on Separable Data? Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu, NeurIPS 2022 spotlight (~5% of 10k submissions).
Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning, Haoxiang Wang $^+$, Yite Wang $^+$, Ruoyu Sun, Bo Li. CVPR 2022.
$^+$: co-first authorOn the Benefit of Width for Neural Networks: Disappearance of Basins [arxiv], Dawei Li, Tian Ding, Ruoyu Sun. SIAM Journal on Optimization 2022. (previous title: [Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations])
Revisiting Landscape Analysis for Neural-networks: Eliminating Decreasing Paths to Infinity, Shiyu Liang, Ruoyu Sun, Srikant. SIAM Journal on Optimization 2021.
Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data Dachao Lin, Ruoyu Sun, Zhihua Zhang. NeurIPS 2021.
When Expressivity Meets Trainability: Fewer than Neurons Can Work Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo. NeurIPS 2021.
Spurious Local Minima Exist for Almost All Over-parameterized Neural Networks Tian Ding, Dawei Li, Ruoyu Sun. Accepted to MOR (Mathematics of Operations Research) 2021.
Understanding Limitation of Two Symmeterized Orders by Worst-case Complexity, Peijun Xiao, Zhisheng Xiao, Ruoyu Sun. Accepted to SIAM Journal on Optimization, 2021.
PenDer: Incorporating Shape Constraints via Penalized Derivatives, G. Akhil, M. Lavanya, R. Sun, N. Shukla, A. Kolbeinsson, AAAI 2021.
Separation of Metabolites and Macromolecules for Short-TE 1H- MRSI Using Learned Component-Specific Representations. Y. Li, Z. Wang, R. Sun and F. Lam, 2021. IEEE transactions on medical imaging, 2021.
RMSprop can converge with proper hyper-parameter Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun. ICLR 2021 (spotlight, 3.3% of 3000+ submissions).
Towards a better global loss landscape of GANs. [arxiv] [link] [slides] Ruoyu Sun, Tiantian Fang, Alex Schwing. NeurIPS 2020 (Oral, 1.1% of 9500+ submissions).
A smoothed GDA algorithm for the nonconvex-concave min-max problem with an $ \mathcal{O}\left(1/\epsilon^2\right) $ iteration complexity. arxiv Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhi-Quan Luo. NeurIPS 2020.
The Global landscape of neural networks: An overview. (survey paper)
Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R Srikant. IEEE Signal Processing Magzine (Special topics in “Non-convex optimization for signal processing and machine learning”), 2020. Compared to another survey which covers all aspects of optimization, this paper focuses on landscape and presents formal results.Optimization for deep learning: theory and algorithms.
Ruoyu Sun. (Survey paper). Journal of Operations Research Society of China (JORSC), 2021. My recent courses IE598 “optimization theory for deep learning” and “mathematics of deep learning” (at PKU appliied math summer school) are partially based on this article.Max-sliced Wasserstein distance for fast GAN training,
Deshpande, I., Hu, Y.T., Sun, R., Pyrros, A., Siddiqui, N., Koyejo, S., Zhao, Z., Forsyth, D. and Schwing, A.G., 2019. Max-Sliced Wasserstein Distance and Its Use for GANs. CVPR 2019, oral (5.58%).On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization, Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong. ICLR 2019.
Worst-case Complexity of Cyclic Coordinate Descent: O(n^2) Gap with Randomized Version, [arxiv], Ruoyu Sun, Yinyu Ye. Mathematical Programming (Series A), 2019.
Globally Optimal Uplink Joint Base Station Association and Beamforming,[arxiv],
Wei Liu, Ruoyu Sun (corresponding author), Zhi-Quan Luo. IEEE Transactions on Communications 2019. Part of the paper has appeared at ICASSP 2014.On the Efficiency of Random Permutation for ADMM and Coordinate Descent, [arxiv],
Ruoyu Sun, Zhi-Quan Luo, Yinyu Ye. Mathematics of Operations Research, 2019. [video]
Previous version: On the Expected Convergence of Randomly Permuted ADMM
2nd Place, 2015 INFORMS George Nicholson student paper competition.
Oral talk, NIPS 2015 workshop on optimization for machine learning (workshop link)Adding One Neuron Can Eliminate All Bad Local Minima [slides],
Shiyu Liang, Ruoyu Sun, Jason Lee, R. Srikant. NeurIPS 2018.Understanding the Loss Surface of Neural Networks for Binary Classification[slides],
Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant. ICML 2018.Guaranteed Matrix Completion via Nonconvex Factorization,[arxiv], [slides] [short summary] Ruoyu Sun, Zhi-Quan Luo.
IEEE Transaction on Information Theory 2016; a shorter version has appeared at FOCS 2015. Honorable mention, 2015 INFORMS Optimization Society student paper prize.(prize page)Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems,
Ruoyu Sun, Mingyi Hong (equal contribution). NIPS 2015.Interference alignment via Feasible Point Pursuit, Aritra Konar, Ruoyu Sun, Nikos Sidiropoulos, Zhi-Quan Luo. Proc. IEEE SPAWC 2015.
Joint Downlink Base Station Association and Power Control for Max-Min Fairness:Computation and Complexity,
Ruoyu Sun, Mingyi Hong, Zhi-Quan Luo.
IEEE Journal of Selected Areas in Communications (JSAC),vol.33, no.6, pp.1040-1054, June 2015. [link][arxiv]Interference Alignment using Finite and Dependent Channel Extensions: the Single Beam Case,
Ruoyu Sun, Zhi-Quan Luo. IEEE Trans. on Information Theory (TIT), vol. 61, no.1, pp.239-255, Jan. 2015. [link] [arxiv] [slides]Cross-Layer Provision of Future Cellular Networks: A WMMSE-based approach,
(alphabet order) Hadi Baligh, Mingyi Hong,Wei-Cheng Liao, Zhi-Quan Luo, Meisam Razaviyayn, Maziar Sanjabi, Ruoyu Sun. IEEE Signal Processing Magazine, vol.31, no.6, pp.56-68, Nov. 2014Long-term Transmit Point Associationfor Coordinated Multipoint Transmission by Stochastic Optimization,
Ruoyu Sun, Hadi Baligh, Zhi-Quan Luo. Proc. IEEE SPAWC 2013.- Joint Base Station Clustering and Beamformer Design for Partial CoordinatedTransmission in Heterogenous Networks,
Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo.
IEEE Journal on Selected Areas in Communications (JSAC), special issues on Large-Scale multiple antenna systems , vol. 31, no. 2, pp. 226-240, Feb. 2013. [link][arxiv] - Optimal Joint Base Station Assignment and PowerAllocation in a Multiuser SISO Network,
Ruoyu Sun, Mingyi Hong, Zhi-Quan Luo. Proc. IEEE SPAWC 2012. - Joint Transceiver Design and Base Station Clustering for Heterogeneous Networks,
Mingyi Hong, Meisam Razaviyayn, Ruoyu Sun and Zhi-Quan Luo. Proc. Asilomar Conference on Signals, Systems and Computers, 2012 - Robust SINR-Constrained MISO Downlink Beamforming: When is Semidefinite Programming Relaxation Tight?,
Enbin Song, Qingjiang Shi, Maziar Sanjabi, Ruoyu Sun, Zhi-Quan Luo.
EURASIP Journal on Wireless Communications and Networking, 2012. [link] Conference versionIEEE ICASSP, 2011.
PATENTS
- System and Method for Transmission Point (TP)Association and Beamforming Assignment in Heterogeneous Networks, Ruoyu Sun, Mingyi Hong, Hadi Baligh, Zhi-Quan Luo, and Meisam Razaviyayn. U.S. Patent App. 13/757,303, filed Feb. 2013.
Ph.D. Dissertation
- Matrix Completion via Nonconvex Factorization: Algorithms and Theory,
Ruoyu Sun, University of Minnesota, May 2015.
Advised master projects (selected)
- Myung-Hwan Song, TRAINABILITY AND GENERALIZATION OF SMALL-SCALE NEURAL NETWORKS, 2019
- Deborshi Goswami, Application of capsule networks for image classification on complex datasets, 2019
- Ziyu Zhou, Multi-Domain Image-to-Image Translation using StarGAN with Max Sliced Wasserstein Distance, 2019