Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.03768
Cited By
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective
6 June 2024
Xinhao Yao
Xiaolin Hu
Shenzhi Yang
Yong Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective"
19 / 19 papers shown
Title
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
98
98
0
26 Mar 2024
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
Zihan Qiu
Zeyu Huang
Youcheng Huang
Jie Fu
KELM
40
5
0
19 Feb 2024
In-Context Learning with Many Demonstration Examples
Mukai Li
Shansan Gong
Jiangtao Feng
Yiheng Xu
Jinchao Zhang
Zhiyong Wu
Lingpeng Kong
80
38
0
09 Feb 2023
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
91
487
0
15 Dec 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
116
504
0
01 Aug 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
157
825
0
14 Apr 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
48
44
0
11 Feb 2022
An Explanation of In-context Learning as Implicit Bayesian Inference
Sang Michael Xie
Aditi Raghunathan
Percy Liang
Tengyu Ma
ReLM
BDL
VPVLM
LRM
175
746
0
03 Nov 2021
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Zhengxiao Du
Yujie Qian
Xiao Liu
Ming Ding
J. Qiu
Zhilin Yang
Jie Tang
BDL
AI4CE
109
1,535
0
18 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
83
80
0
24 Feb 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
130
820
0
29 Dec 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
167
1,570
0
30 Sep 2020
Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms
Mahdi Haghifam
Jeffrey Negrea
Ashish Khisti
Daniel M. Roy
Gintare Karolina Dziugaite
134
106
0
27 Apr 2020
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates
Jeffrey Negrea
Mahdi Haghifam
Gintare Karolina Dziugaite
Ashish Khisti
Daniel M. Roy
FedML
153
152
0
06 Nov 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
104
1,134
0
23 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
240
2,307
0
02 May 2019
Information-theoretic analysis of generalization capability of learning algorithms
Aolin Xu
Maxim Raginsky
149
445
0
22 May 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
320
4,624
0
10 Nov 2016
Learning Structured Sparsity in Deep Neural Networks
W. Wen
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
162
2,337
0
12 Aug 2016
1