The Unreasonable Ineffectiveness of the Deeper Layers

The Unreasonable Ineffectiveness of the Deeper Layers

26 March 2024

Kushal Tirumala

Hassan Shapourian

Daniel A. Roberts

Papers citing "The Unreasonable Ineffectiveness of the Deeper Layers"

17 / 67 papers shown

Title
AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer Yitao Xu Tong Zhang Sabine Süsstrunk ViT 47 0 0 12 Jun 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization Haoran You Yipin Guo Yichao Fu Wei Zhou Huihong Shi Xiaofan Zhang Souvik Kundu Amir Yazdanbakhsh Y. Lin KELM 56 7 0 10 Jun 2024
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective Xinhao Yao Xiaolin Hu Shenzhi Yang Yong Liu 47 2 0 06 Jun 2024
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs Mustafa Shukor Matthieu Cord 68 5 0 26 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers Emily Cheng Diego Doimo Corentin Kervadec Iuri Macocco Jade Yu A. Laio Marco Baroni 112 11 0 24 May 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models Akide Liu Jing Liu Zizheng Pan Yefei He Gholamreza Haffari Bohan Zhuang MQ 35 30 0 23 May 2024
Towards smaller, faster decoder-only transformers: Architectural variants and their implications Sathya Krishnan Suresh P. Shunmugapriya 24 0 0 22 Apr 2024
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? Mingyu Jin Qinkai Yu Jingyuan Huang Qingcheng Zeng Zhenting Wang ... Yanda Meng Kaize Ding Fan Yang Jundong Li Yongfeng Zhang 52 0 0 10 Apr 2024
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers Longwei Zou Qingyang Wang Han Zhao Jiangang Kong Yi Yang Yangdong Deng 42 0 0 10 Apr 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Saleh Ashkboos Maximilian L. Croci Marcelo Gennari do Nascimento Torsten Hoefler James Hensman VLM 132 145 0 26 Jan 2024
CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks Andrei Tomut S. Jahromi Abhijoy Sarkar Uygar Kurt Sukhbinder Singh ... Muhammad Ibrahim Oussama Tahiri-Alaoui John Malcolm Samuel Mugel Roman Orus MQ 47 13 0 25 Jan 2024
Fast and Optimal Weight Update for Pruned Large Language Models Vladimír Boza 27 6 0 01 Jan 2024
KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis Youngwan Lee Kwanyong Park Yoorhim Cho Yong-Ju Lee Sung Ju Hwang VLM 27 3 0 07 Dec 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes Lokesh Nagalapatti Chun-Liang Li Chih-Kuan Yeh Hootan Nakhost Yasuhisa Fujii Alexander Ratner Ranjay Krishna Chen-Yu Lee Tomas Pfister ALM 220 502 0 03 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 191 261 0 28 Apr 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps Goro Kobayashi Tatsuki Kuribayashi Sho Yokoi Kentaro Inui 28 14 0 01 Feb 2023
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 255 4,489 0 23 Jan 2020