Enhancing In-Context Learning Performance with just SVD-Based Weight
Pruning: A Theoretical Perspective

Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective

6 June 2024

Papers citing "Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective"

19 / 19 papers shown

Title
The Unreasonable Ineffectiveness of the Deeper Layers Andrey Gromov Kushal Tirumala Hassan Shapourian Paolo Glorioso Daniel A. Roberts 98 98 0 26 Mar 2024
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers Zihan Qiu Zeyu Huang Youcheng Huang Jie Fu KELM 40 5 0 19 Feb 2024
In-Context Learning with Many Demonstration Examples Mukai Li Shansan Gong Jiangtao Feng Yiheng Xu Jinchao Zhang Zhiyong Wu Lingpeng Kong 80 38 0 09 Feb 2023
Transformers learn in-context by gradient descent J. Oswald Eyvind Niklasson E. Randazzo João Sacramento A. Mordvintsev A. Zhmoginov Max Vladymyrov MLT 91 487 0 15 Dec 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes Shivam Garg Dimitris Tsipras Percy Liang Gregory Valiant 116 504 0 01 Aug 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model Sid Black Stella Biderman Eric Hallahan Quentin G. Anthony Leo Gao ... Shivanshu Purohit Laria Reynolds J. Tow Benqi Wang Samuel Weinbach 157 825 0 14 Apr 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention Kazuki Irie Róbert Csordás Jürgen Schmidhuber 48 44 0 11 Feb 2022
An Explanation of In-context Learning as Implicit Bayesian Inference Sang Michael Xie Aditi Raghunathan Percy Liang Tengyu Ma ReLM BDL VPVLM LRM 175 746 0 03 Nov 2021
GLM: General Language Model Pretraining with Autoregressive Blank Infilling Zhengxiao Du Yujie Qian Xiao Liu Ming Ding J. Qiu Zhilin Yang Jie Tang BDL AI4CE 109 1,535 0 18 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 83 80 0 24 Feb 2021
Transformer Feed-Forward Layers Are Key-Value Memories Mor Geva R. Schuster Jonathan Berant Omer Levy KELM 130 820 0 29 Dec 2020
Rethinking Attention with Performers K. Choromanski Valerii Likhosherstov David Dohan Xingyou Song Andreea Gane ... Afroz Mohiuddin Lukasz Kaiser David Belanger Lucy J. Colwell Adrian Weller 167 1,570 0 30 Sep 2020
Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms Mahdi Haghifam Jeffrey Negrea Ashish Khisti Daniel M. Roy Gintare Karolina Dziugaite 134 106 0 27 Apr 2020
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates Jeffrey Negrea Mahdi Haghifam Gintare Karolina Dziugaite Ashish Khisti Daniel M. Roy FedML 153 152 0 06 Nov 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 104 1,134 0 23 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems Alex Jinpeng Wang Yada Pruksachatkun Nikita Nangia Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 240 2,307 0 02 May 2019
Information-theoretic analysis of generalization capability of learning algorithms Aolin Xu Maxim Raginsky 149 445 0 22 May 2017
Understanding deep learning requires rethinking generalization Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht Oriol Vinyals HAI 320 4,624 0 10 Nov 2016
Learning Structured Sparsity in Deep Neural Networks W. Wen Chunpeng Wu Yandan Wang Yiran Chen Hai Helen Li 162 2,337 0 12 Aug 2016