Title
An Adaptive Method Stabilizing Activations for Enhanced Generalization Hyunseok Seung Jaewoo Lee Hyunsuk Ko ODL 37 0 0 10 Jun 2025
Investigating Mask-aware Prototype Learning for Tabular Anomaly Detection Ruiying Lu Jinhan Liu Chuan Du D. Guo OOD AAML 68 0 0 03 Jun 2025
Taming LLMs by Scaling Learning Rates with Gradient Grouping Siyuan Li Juanxi Tian Zedong Wang Xin Jin Zicheng Liu Wentao Zhang Dan Xu 52 0 0 01 Jun 2025
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training Yehonathan Refael Guy Smorodinsky Tom Tirer Ofir Lindenbaum 48 0 0 30 May 2025
On the Convergence Analysis of Muon Wei Shen Ruichuan Huang Minhui Huang Cong Shen Jiawei Zhang 64 0 0 29 May 2025
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models Alex Iacob Lorenzo Sani M. Safaryan Paris Giampouras Samuel Horváth ... Meghdad Kurmanji Preslav Aleksandrov William F. Shen Xinchi Qiu Nicholas D. Lane OffRL 112 0 0 28 May 2025
Deep Learning-Based Forecasting of Boarding Patient Counts to Address ED Overcrowding Orhun Vural Bunyamin Ozaydin Khalid Y. Aram James Booth Brittany F. Lindsey 58 0 0 20 May 2025
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training Shane Bergsma Nolan Dey Gurpreet Gosal Gavia Gray Daria Soboleva Joel Hestness 80 2 0 19 May 2025
A Physics-Inspired Optimizer: Velocity Regularized Adam Pranav Vaidhyanathan Lucas Schorling Natalia Ares Michael A. Osborne ODL 81 0 0 19 May 2025
$On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm$ On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm Huan Li Yiming Dong Zhouchen Lin 79 0 0 17 May 2025
Pretraining Large Brain Language Model for Active BCI: Silent Speech Jinzhao Zhou Zehong Cao Yiqun Duan Connor Barkley Daniel Leong ... Ziyi Zhao T. Do Yu-Cheng Chang Sheng-Fu Liang Chin-Teng Lin 112 1 0 29 Apr 2025
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching Junn Yong Loo Michelle Adeline Julia Kaiwen Lau Fang Yu Leong Hwa Hui Tew Arghya Pal Vishnu Monn Baskaran Chee-Ming Ting Raphaël C.-W. Phan BDL 109 0 0 22 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer Soham Sane ODL 151 0 0 22 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network Daniel Bolya Po-Yao (Bernie) Huang Peize Sun Jang Hyun Cho Andrea Madotto ... Shiyu Dong Nikhila Ravi Daniel Li Piotr Dollár Christoph Feichtenhofer ObjD VOS 331 9 0 17 Apr 2025
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training Mingyu Liang Hiwot Tadese Kassa Wenyin Fu Brian Coutinho Louis Feng Christina Delimitrou 40 0 0 12 Apr 2025
Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware Ching-Yi Lin Sahil Shah MQ 140 0 0 11 Apr 2025
Neural Encoding and Decoding at Scale Yizi Zhang Yanchen Wang Mehdi Azabou Alexandre Andre Zixuan Wang Hanrui Lyu International Brain Laboratory Eva L. Dyer Liam Paninski Cole Hurwitz AI4CE 170 1 0 11 Apr 2025
The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound Blake Vanberlo Alexander Wong Jesse Hoey R. Arntfield 90 0 0 10 Apr 2025
Spectral-Adaptive Modulation Networks for Visual Perception Guhnoo Yun J. Yoo Kijung Kim Jeongho Lee Paul Hongsuck Seo Dong Hwan Kim 129 0 0 31 Mar 2025
ASGO: Adaptive Structured Gradient Optimization Kang An Yuxing Liu Boyao Wang Shiqian Ma Shiqian Ma Tong Zhang Tong Zhang ODL 157 5 0 26 Mar 2025
Show and Segment: Universal Medical Image Segmentation via In-Context Learning Yunhe Gao Di Liu Zhuowei Li You Li DongDong Chen Mu Zhou Dimitris N. Metaxas VLM 88 0 0 25 Mar 2025
Structured Preconditioners in Adaptive Optimization: A Unified Analysis Shuo Xie Tianhao Wang Sashank J. Reddi Sanjiv Kumar Zhiyuan Li 87 4 0 13 Mar 2025
ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation Tobias Christian Nauen Brian B. Moser Federico Raue Stanislav Frolov Andreas Dengel ViT 187 0 0 12 Mar 2025
Variational Bayesian Pseudo-Coreset Hyungi Lee Seanie Lee Juho Lee BDL 73 0 0 28 Feb 2025
Self-Adjust Softmax Chuanyang Zheng Yihang Gao Guoxuan Chen Han Shi Jing Xiong Xiaozhe Ren Chao Huang Xin Jiang Zhiyu Li Yu Li 90 1 0 25 Feb 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs Shane Bergsma Nolan Dey Gurpreet Gosal Gavia Gray Daria Soboleva Joel Hestness 109 8 0 21 Feb 2025
Preconditioned Inexact Stochastic ADMM for Deep Model Shenglong Zhou Ouya Wang Ziyan Luo Yongxu Zhu Geoffrey Ye Li 97 0 0 15 Feb 2025
Gradient Multi-Normalization for Stateless and Scalable LLM Training M. Scetbon Chao Ma Wenbo Gong Edward Meeds 196 1 0 10 Feb 2025
Model Diffusion for Certifiable Few-shot Transfer Learning Fady Rezk Royson Lee Henry Gouk Timothy M. Hospedales Minyoung Kim 150 0 0 10 Feb 2025
Importance Sampling via Score-based Generative Models Heasung Kim Taekyun Lee Hyeji Kim Gustavo de Veciana MedIm DiffM 210 0 0 07 Feb 2025
Celo: Training Versatile Learned Optimizers on a Compute Diet A. Moudgil Boris Knyazev Guillaume Lajoie Eugene Belilovsky 454 0 0 22 Jan 2025
A Hessian-informed hyperparameter optimization for differential learning rate Shiyun Xu Zhiqi Bu Yiliang Zhang Ian Barnett 131 1 0 12 Jan 2025
AdaPRL: Adaptive Pairwise Regression Learning with Uncertainty Estimation for Universal Regression Tasks Fuhang Liang Rucong Xu Deng Lin OOD 95 0 0 10 Jan 2025
Mapping the Edge of Chaos: Fractal-Like Boundaries in The Trainability of Decoder-Only Transformer Models Bahman Torkamandi AI4CE 101 0 0 08 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism Tim Tsz-Kit Lau Weijian Li Chenwei Xu Han Liu Mladen Kolar 473 0 0 30 Dec 2024
Self-supervised Spatial-Temporal Learner for Precipitation Nowcasting Haotian Li A. Siebes S. Mehrkanoon SSL 92 0 0 20 Dec 2024
A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization Chuan He 143 0 0 19 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation Xuehai He Shuohang Wang Jianwei Yang Xiaoxia Wu Yansen Wang Kuan-Chieh Wang Z. Zhan Olatunji Ruwase Yelong Shen Xinze Wang VGen 242 2 0 12 Dec 2024
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation Guanxing Lu Tengbo Yu Haoyuan Deng Season Si Chen Yansong Tang Ziwei Wang 171 3 0 09 Dec 2024
Convolutional Neural Networks Do Work with Pre-Defined Filters C. Linse Erhardt Barth T. Martinetz 149 5 0 27 Nov 2024
Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles Shuman Peng Arash Khoeini Sharan Vaswani Martin Ester 159 0 0 20 Nov 2024
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training Yoni Choukroun Shlomi Azoulay P. Kisilev 93 0 0 06 Nov 2024
LASER: Attention with Exponential Transformation Sai Surya Duvvuri Inderjit Dhillon 55 1 0 05 Nov 2024
Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models Junjiao Tian Chengyue Huang Z. Kira 65 2 0 03 Nov 2024
Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers Kai Yan Alex Schwing Yu-Xiong Wang OffRL OnRL 83 0 0 31 Oct 2024
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training Atli Kosson Bettina Messmer Martin Jaggi AI4CE 73 5 0 31 Oct 2024
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization Jui-Nan Yen Si Si Zhao Meng Felix X. Yu Sai Surya Duvvuri Inderjit Dhillon Cho-Jui Hsieh Sanjiv Kumar 90 5 0 27 Oct 2024
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading Avinash Maurya Jie Ye M. Rafique Franck Cappello Bogdan Nicolae 78 1 0 26 Oct 2024
Leaky ReLUs That Differ in Forward and Backward Pass Facilitate Activation Maximization in Deep Neural Networks C. Linse Erhardt Barth Thomas Martinetz 70 1 0 22 Oct 2024
Rethinking generalization of classifiers in separable classes scenarios and over-parameterized regimes Julius Martinetz C. Linse Thomas Martinetz 91 0 0 22 Oct 2024