ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.17887
  4. Cited By
The Unreasonable Ineffectiveness of the Deeper Layers

The Unreasonable Ineffectiveness of the Deeper Layers

26 March 2024
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
ArXivPDFHTML

Papers citing "The Unreasonable Ineffectiveness of the Deeper Layers"

50 / 67 papers shown
Title
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
63
0
0
06 May 2025
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
140
0
0
05 May 2025
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
96
0
0
03 May 2025
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Leandro Giusti Mugnaini
Bruno Yamamoto
Lucas Lauton de Alcantara
Victor Zacarias
Edson Bollis
Lucas Pellicer
A. H. R. Costa
Artur Jordao
47
0
0
29 Apr 2025
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures
Miguel Nogales
Matteo Gambella
Manuel Roveri
56
0
0
29 Apr 2025
Kuwain 1.5B: An Arabic SLM via Language Injection
Kuwain 1.5B: An Arabic SLM via Language Injection
Khalil Hennara
Sara Chrouf
Mohamed Motaism Hamed
Zeina Aldallal
Omar Hadid
Safwan AlModhayan
29
1
0
21 Apr 2025
SD$^2$: Self-Distilled Sparse Drafters
SD2^22: Self-Distilled Sparse Drafters
Mike Lasby
Nish Sinnadurai
Valavan Manohararajah
Sean Lie
Vithursan Thangarasa
143
1
0
10 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Zehan Li
L. Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
59
0
0
31 Mar 2025
Approximating Latent Manifolds in Neural Networks via Vanishing Ideals
Approximating Latent Manifolds in Neural Networks via Vanishing Ideals
Nico Pelleriti
Max Zimmer
Elias Wirth
Sebastian Pokutta
39
0
0
24 Feb 2025
Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?
Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?
Maciej Chrabąszcz
Filip Szatkowski
Bartosz Wójcik
Jan Dubiñski
Tomasz Trzciñski
52
0
0
22 Feb 2025
Pruning as a Defense: Reducing Memorization in Large Language Models
Pruning as a Defense: Reducing Memorization in Large Language Models
Mansi Gupta
Nikhar Waghela
Sarthak Gupta
Shourya Goel
Sanjif Shanmugavelu
AAML
47
0
0
18 Feb 2025
Hyperspherical Energy Transformer with Recurrent Depth
Yunzhe Hu
Difan Zou
Dong Xu
46
0
0
17 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoE
AI4CE
66
1
0
13 Feb 2025
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
H. Seo
Wongi Jeong
Jae-sun Seo
Se Young Chun
60
0
0
12 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
67
4
0
09 Feb 2025
How Redundant Is the Transformer Stack in Speech Representation Models?
How Redundant Is the Transformer Stack in Speech Representation Models?
Teresa Dorszewski
Albert Kjøller Jacobsen
Lenka Tětková
Lars Kai Hansen
107
0
0
20 Jan 2025
CURing Large Models: Compression via CUR Decomposition
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
41
0
0
08 Jan 2025
An Analysis Framework for Understanding Deep Neural Networks Based on Network Dynamics
An Analysis Framework for Understanding Deep Neural Networks Based on Network Dynamics
Yuchen Lin
Yong Zhang
Sihan Feng
Hong Zhao
36
0
0
05 Jan 2025
Optimizing Small Language Models for In-Vehicle Function-Calling
Optimizing Small Language Models for In-Vehicle Function-Calling
Yahya Sowti Khiabani
Farris Atif
Chieh Hsu
Sven Stahlmann
Tobias Michels
Sebastian Kramer
Benedikt Heidrich
M. Saquib Sarfraz
Julian Merten
Faezeh Tafazzoli
28
1
0
04 Jan 2025
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Yehonathan Refael
Jonathan Svirsky
Boris Shustin
Wasim Huleihel
Ofir Lindenbaum
41
3
0
31 Dec 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
  Post-LN
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li
Lu Yin
Shiwei Liu
70
4
0
18 Dec 2024
Federated Source-free Domain Adaptation for Classification: Weighted
  Cluster Aggregation for Unlabeled Data
Federated Source-free Domain Adaptation for Classification: Weighted Cluster Aggregation for Unlabeled Data
Junki Mori
Kosuke Kihara
Taiki Miyagawa
Akinori F. Ebihara
Isamu Teranishi
Hisashi Kashima
76
1
0
18 Dec 2024
Lightweight Safety Classification Using Pruned Language Models
Lightweight Safety Classification Using Pruned Language Models
Mason Sawtell
Tula Masterman
Sandi Besen
Jim Brown
88
2
0
18 Dec 2024
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich
Tomer Ronen
Talor Abramovich
Nir Ailon
Nave Assaf
...
Ido Shahaf
Oren Tropp
Omer Ullman Argov
Ran Zilberstein
Ran El-Yaniv
77
1
0
28 Nov 2024
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient
  and Instant Deployment
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
Y. Fu
Zhongzhi Yu
Junwei Li
Jiayi Qian
Yongan Zhang
Xiangchi Yuan
Dachuan Shi
Roman Yakunin
Y. Lin
29
2
0
15 Nov 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
47
0
0
11 Nov 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
54
10
0
07 Nov 2024
Change Is the Only Constant: Dynamic LLM Slicing based on Layer
  Redundancy
Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy
Razvan-Gabriel Dumitru
Paul-Ioan Clotan
Vikas Yadav
Darius Peteleaza
Mihai Surdeanu
36
4
0
05 Nov 2024
Efficient Training of Sparse Autoencoders for Large Language Models via
  Layer Groups
Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
Davide Ghilardi
Federico Belotti
Marco Molinari
32
2
0
28 Oct 2024
Understanding Layer Significance in LLM Alignment
Understanding Layer Significance in LLM Alignment
Guangyuan Shi
Zexin Lu
Xiaoyu Dong
Wenlong Zhang
Xuanyu Zhang
Yujie Feng
Xiao-Ming Wu
55
2
0
23 Oct 2024
Large Language Models Are Overparameterized Text Encoders
Large Language Models Are Overparameterized Text Encoders
Thennal D K
Tim Fischer
Chris Biemann
38
2
0
18 Oct 2024
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Shwai He
Tao Ge
Guoheng Sun
Bowei Tian
Xiaoyang Wang
Ang Li
MoE
54
1
0
17 Oct 2024
Persistent Topological Features in Large Language Models
Persistent Topological Features in Large Language Models
Yuri Gardinazzi
Giada Panerai
Karthik Viswanathan
A. Ansuini
Alberto Cazzaniga
Matteo Biagetti
45
2
0
14 Oct 2024
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved
  Layer-wise Pruning of Large Language Models
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
Haiquan Lu
Yefan Zhou
Shiwei Liu
Zhangyang Wang
Michael W. Mahoney
Yaoqing Yang
29
0
0
14 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
38
1
0
13 Oct 2024
Skipping Computations in Multimodal LLMs
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
26
2
0
12 Oct 2024
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving
  Model Transformation
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
Aurick Qiao
Z. Yao
Samyam Rajbhandari
Yuxiong He
VLM
32
0
0
04 Oct 2024
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine
  Similarity
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity
Michael R. Metel
Peng Lu
Boxing Chen
Mehdi Rezagholizadeh
I. Kobyzev
27
3
0
01 Oct 2024
CFSP: An Efficient Structured Pruning Framework for LLMs with
  Coarse-to-Fine Activation Information
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang
Minghua Ma
Zekun Wang
Jingchang Chen
Huiming Fan
Liping Shan
Qing Yang
Dongliang Xu
Ming Liu
Bing Qin
38
3
0
20 Sep 2024
Recall: Empowering Multimodal Embedding for Edge Devices
Recall: Empowering Multimodal Embedding for Edge Devices
Dongqi Cai
Shangguang Wang
Chen Peng
Zeling Zhang
Mengwei Xu
27
3
0
09 Sep 2024
Application Specific Compression of Deep Learning Models
Application Specific Compression of Deep Learning Models
Rohit Raj Rai
Angana Borah
Amit Awekar
24
0
0
09 Sep 2024
LLM Pruning and Distillation in Practice: The Minitron Approach
LLM Pruning and Distillation in Practice: The Minitron Approach
Sharath Turuvekere Sreenivas
Saurav Muralidharan
Raviraj Joshi
Marcin Chochowski
M. Patwary
M. Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
28
25
0
21 Aug 2024
Research on Personalized Compression Algorithm for Pre-trained Models
  Based on Homomorphic Entropy Increase
Research on Personalized Compression Algorithm for Pre-trained Models Based on Homomorphic Entropy Increase
Yicong Li
Xing Guo
Haohua Du
35
0
0
16 Aug 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A
  Case Study on Sparse Linear Regression
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen
Lei Zhao
Difan Zou
43
6
0
08 Aug 2024
Compact Language Models via Pruning and Knowledge Distillation
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
M. Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
39
37
0
19 Jul 2024
Accuracy is Not All You Need
Accuracy is Not All You Need
Abhinav Dutta
Sanjeev Krishnan
Nipun Kwatra
Ramachandran Ramjee
41
3
0
12 Jul 2024
The Remarkable Robustness of LLMs: Stages of Inference?
The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
Wes Gurnee
Max Tegmark
38
33
0
27 Jun 2024
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing
  LLMs Beyond Integer Bit-Levels
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels
Razvan-Gabriel Dumitru
Vikas Yadav
Rishabh Maheshwary
Paul-Ioan Clotan
Sathwik Tejaswi Madhusudhan
Mihai Surdeanu
MQ
38
2
0
25 Jun 2024
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned
  MT Evaluation Metrics
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
Daniil Larionov
Mikhail Seleznyov
Vasiliy Viskov
Alexander Panchenko
Steffen Eger
37
3
0
20 Jun 2024
LaCoOT: Layer Collapse through Optimal Transport
LaCoOT: Layer Collapse through Optimal Transport
Victor Quétu
Nour Hezbri
Enzo Tartaglione
31
0
0
13 Jun 2024
12
Next