Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

21 February 2025

Papers citing "Looped ReLU MLPs May Be All You Need as Practical Programmable Computers"

13 / 13 papers shown

Title
Theoretical Guarantees for High Order Trajectory Refinement in Generative Flows Chengyue Gong Xiaoyu Li Yingyu Liang Jiangxuan Long Zhenmei Shi Zhao Song Yu Tian 56 3 0 12 Mar 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches Yifang Chen Xuyang Guo Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 73 3 0 03 Mar 2025
On Computational Limits of FlowAR Models: Expressivity and Efficiency Chengyue Gong Yekun Ke Xiaoyu Li Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song 74 3 0 23 Feb 2025
DPBloomfilter: Securing Bloom Filters with Differential Privacy Yekun Ke Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song 147 1 0 02 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation Yang Cao Zhao Song Chiwun Yang VGen 46 2 0 01 Feb 2025
Circuit Complexity Bounds for Visual Autoregressive Model Yekun Ke Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 45 5 0 08 Jan 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time Yifang Chen Jiayan Huo Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 61 12 0 03 Jan 2025
$Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers$ Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$ -based Tensor Attention Transformers Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song Mingda Wan 110 8 0 23 Dec 2024
Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study Yekun Ke Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 69 3 0 15 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent Bo Chen Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 96 20 0 15 Oct 2024
HSR-Enhanced Sparse Attention Acceleration Bo Chen Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song 95 19 0 14 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? Khashayar Gatmiry Nikunj Saunshi Sashank J. Reddi Stefanie Jegelka Sanjiv Kumar 67 17 0 10 Oct 2024
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression Shuai Li Zhao Song Yu Xia Tong Yu Dinesh Manocha 36 36 0 26 Apr 2023