Scaling MLPs: A Tale of Inductive Bias

23 June 2023

Papers citing "Scaling MLPs: A Tale of Inductive Bias"

17 / 17 papers shown

Title
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Sotiris Anagnostidis Gregor Bachmann Yeongmin Kim Jonas Kohler Markos Georgopoulos A. Sanakoyeu Yuming Du Albert Pumarola Ali K. Thabet Edgar Schönfeld 92 0 0 27 Feb 2025
Exploring Kolmogorov-Arnold Networks for Interpretable Time Series Classification Irina Barašin Blaž Bertalanič M. Mohorčič Carolina Fortuna AI4TS 154 2 0 22 Nov 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models Tomer Porian Mitchell Wortsman J. Jitsev Ludwig Schmidt Y. Carmon 60 20 0 27 Jun 2024
Kolmogorov-Arnold Networks (KANs) for Time Series Analysis Cristian J. Vaca-Rubio Luis Blanco Roberto Pereira Marius Caus AI4TS 21 98 0 14 May 2024
Neural Redshift: Random Networks are not Random Functions Damien Teney A. Nicolicioiu Valentin Hartmann Ehsan Abbasnejad 103 19 0 04 Mar 2024
GLIMPSE: Generalized Local Imaging with MLPs AmirEhsan Khorashadizadeh Valentin Debarnot Tianlin Liu Ivan Dokmanić 36 1 0 01 Jan 2024
Transformer Fusion with Optimal Transport Moritz Imfeld Jacopo Graldi Marco Giordano Thomas Hofmann Sotiris Anagnostidis Sidak Pal Singh ViT MoMe 32 16 0 09 Oct 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 48 8 0 07 Sep 2023
The Curious Case of Benign Memorization Sotiris Anagnostidis Gregor Bachmann Lorenzo Noci Thomas Hofmann AAML 49 8 0 25 Oct 2022
Patches Are All You Need? Asher Trockman J. Zico Kolter ViT 225 402 0 24 Jan 2022
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 286 2,606 0 04 May 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 368 5,811 0 29 Apr 2021
ImageNet-21K Pretraining for the Masses T. Ridnik Emanuel Ben-Baruch Asaf Noy Lihi Zelnik-Manor SSeg VLM CLIP 187 689 0 22 Apr 2021
Towards Learning Convolutions from Scratch Behnam Neyshabur SSL 220 71 0 27 Jul 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 264 4,505 0 23 Jan 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,892 0 15 Sep 2016
Convolution by Evolution: Differentiable Pattern Producing Networks Chrisantha Fernando Dylan Banarse Malcolm Reynolds F. Besse David Pfau Max Jaderberg Marc Lanctot Daan Wierstra 191 102 0 08 Jun 2016