ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.00099
  4. Cited By
Efficient Methods for Natural Language Processing: A Survey
v1v2 (latest)

Efficient Methods for Natural Language Processing: A Survey

31 August 2022
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
Manuel R. Ciosici
Michael Hassid
Kenneth Heafield
Sara Hooker
Colin Raffel
Pedro Henrique Martins
André F. T. Martins
Jessica Zosa Forde
Peter Milder
Edwin Simpson
Noam Slonim
Jesse Dodge
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
ArXiv (abs)PDFHTML

Papers citing "Efficient Methods for Natural Language Processing: A Survey"

50 / 244 papers shown
Title
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
110
302
0
14 Jan 2022
Faster Nearest Neighbor Machine Translation
Faster Nearest Neighbor Machine Translation
Shuhe Wang
Jiwei Li
Yuxian Meng
Rongbin Ouyang
Guoyin Wang
Xiaoya Li
Tianwei Zhang
Shi Zong
33
12
0
15 Dec 2021
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALMMoE
224
819
0
13 Dec 2021
Scaling Language Models: Methods, Analysis & Insights from Training
  Gopher
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Jack W. Rae
Sebastian Borgeaud
Trevor Cai
Katie Millican
Jordan Hoffmann
...
Jeff Stanway
L. Bennett
Demis Hassabis
Koray Kavukcuoglu
G. Irving
136
1,322
0
08 Dec 2021
Improving language models by retrieving from trillions of tokens
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud
A. Mensch
Jordan Hoffmann
Trevor Cai
Eliza Rutherford
...
Simon Osindero
Karen Simonyan
Jack W. Rae
Erich Elsen
Laurent Sifre
KELMRALM
247
1,099
0
08 Dec 2021
Pixelated Butterfly: Simple and Efficient Sparse training for Neural
  Network Models
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Tri Dao
Beidi Chen
Kaizhao Liang
Jiaming Yang
Zhao Song
Atri Rudra
Christopher Ré
94
79
0
30 Nov 2021
How Well Do Sparse Imagenet Models Transfer?
How Well Do Sparse Imagenet Models Transfer?
Eugenia Iofinova
Alexandra Peste
Mark Kurtz
Dan Alistarh
68
40
0
26 Nov 2021
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
V. Aribandi
Yi Tay
Tal Schuster
J. Rao
H. Zheng
...
Jianmo Ni
Jai Gupta
Kai Hui
Sebastian Ruder
Donald Metzler
MoE
97
216
0
22 Nov 2021
Training Neural Networks with Fixed Sparse Masks
Training Neural Networks with Fixed Sparse Masks
Yi-Lin Sung
Varun Nair
Colin Raffel
FedML
90
208
0
18 Nov 2021
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
  Gradient-Disentangled Embedding Sharing
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Pengcheng He
Jianfeng Gao
Weizhu Chen
169
1,204
0
18 Nov 2021
Prune Once for All: Sparse Pre-Trained Language Models
Prune Once for All: Sparse Pre-Trained Language Models
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
55
85
0
10 Nov 2021
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
217
1,814
0
31 Oct 2021
Sustainable AI: Environmental Implications, Challenges and Opportunities
Sustainable AI: Environmental Implications, Challenges and Opportunities
Carole-Jean Wu
Ramya Raghavendra
Udit Gupta
Bilge Acun
Newsha Ardalani
...
Maximilian Balandat
Joe Spisak
R. Jain
Michael G. Rabbat
K. Hazelwood
119
409
0
30 Oct 2021
The Efficiency Misnomer
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
103
103
0
25 Oct 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
348
1,706
0
15 Oct 2021
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu
Tianxiang Sun
Junliang He
Jiawen Wu
Lingling Wu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
ELM
72
47
0
13 Oct 2021
Towards a Unified View of Parameter-Efficient Transfer Learning
Towards a Unified View of Parameter-Efficient Transfer Learning
Junxian He
Chunting Zhou
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
AAML
129
952
0
08 Oct 2021
The Low-Resource Double Bind: An Empirical Study of Pruning for
  Low-Resource Machine Translation
The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation
Orevaoghene Ahia
Julia Kreutzer
Sara Hooker
164
55
0
06 Oct 2021
8-bit Optimizers via Block-wise Quantization
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
117
302
0
06 Oct 2021
Predicting Attention Sparsity in Transformers
Predicting Attention Sparsity in Transformers
Marcos Vinícius Treviso
António Góis
Patrick Fernandes
E. Fonseca
André F. T. Martins
124
14
0
24 Sep 2021
SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter
  Optimization
SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
Marius Lindauer
Katharina Eggensperger
Matthias Feurer
André Biedenkapp
Difan Deng
C. Benjamins
Tim Ruhopf
René Sass
Frank Hutter
124
345
0
20 Sep 2021
Efficient Nearest Neighbor Language Models
Efficient Nearest Neighbor Language Models
Junxian He
Graham Neubig
Taylor Berg-Kirkpatrick
RALM
222
104
0
09 Sep 2021
Active Learning by Acquiring Contrastive Examples
Active Learning by Acquiring Contrastive Examples
Katerina Margatina
Giorgos Vernikos
Loïc Barrault
Nikolaos Aletras
78
192
0
08 Sep 2021
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT
  Compression
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression
Canwen Xu
Wangchunshu Zhou
Tao Ge
Kelvin J. Xu
Julian McAuley
Furu Wei
53
42
0
07 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
72
20
0
05 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALMUQCV
214
3,778
0
03 Sep 2021
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Rishabh Agarwal
Max Schwarzer
Pablo Samuel Castro
Aaron Courville
Marc G. Bellemare
OffRL
121
673
0
30 Aug 2021
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
334
774
0
27 Aug 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
  in Natural Language Processing
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLMSyDa
218
3,987
0
28 Jul 2021
A Tale Of Two Long Tails
A Tale Of Two Long Tails
Daniel D'souza
Zach Nussbaum
Chirag Agarwal
Sara Hooker
51
23
0
27 Jul 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
360
634
0
14 Jul 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
233
5,635
0
07 Jul 2021
Mind Your Outliers! Investigating the Negative Impact of Outliers on
  Active Learning for Visual Question Answering
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti
Ranjay Krishna
Li Fei-Fei
Christopher D. Manning
85
92
0
06 Jul 2021
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based
  Masked Language-models
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
Elad Ben-Zaken
Shauli Ravfogel
Yoav Goldberg
171
1,240
0
18 Jun 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
477
10,496
0
17 Jun 2021
Deep Learning Through the Lens of Example Difficulty
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
81
161
0
17 Jun 2021
An Empirical Study on Hyperparameter Optimization for Fine-Tuning
  Pre-trained Language Models
An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models
Xueqing Liu
Chi Wang
47
19
0
17 Jun 2021
Does Knowledge Distillation Really Work?
Does Knowledge Distillation Really Work?
Samuel Stanton
Pavel Izmailov
Polina Kirichenko
Alexander A. Alemi
A. Wilson
FedML
69
221
0
10 Jun 2021
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Rabeeh Karimi Mahabadi
James Henderson
Sebastian Ruder
MoE
111
492
0
08 Jun 2021
Annotation Curricula to Implicitly Train Non-Expert Annotators
Annotation Curricula to Implicitly Train Non-Expert Annotators
Ji-Ung Lee
Jan-Christoph Klie
Iryna Gurevych
26
11
0
04 Jun 2021
On the Distribution, Sparsity, and Inference-time Quantization of
  Attention Values in Transformers
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers
Tianchu Ji
Shraddhan Jain
M. Ferdman
Peter Milder
H. Andrew Schwartz
Niranjan Balasubramanian
MQ
92
16
0
02 Jun 2021
IrEne: Interpretable Energy Prediction for Transformers
IrEne: Interpretable Energy Prediction for Transformers
Qingqing Cao
Yash Kumar Lal
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
11
14
0
02 Jun 2021
Gender Bias Amplification During Speed-Quality Optimization in Neural
  Machine Translation
Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation
Adithya Renduchintala
Denise Díaz
Kenneth Heafield
Xian Li
Mona T. Diab
56
41
0
01 Jun 2021
Fast Nearest Neighbor Machine Translation
Fast Nearest Neighbor Machine Translation
Yuxian Meng
Xiaoya Li
Xiayu Zheng
Leilei Gan
Xiaofei Sun
Tianwei Zhang
Jiwei Li
LRM
51
49
0
30 May 2021
An Attention Free Transformer
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
69
132
0
28 May 2021
FNet: Mixing Tokens with Fourier Transforms
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
101
530
0
09 May 2021
Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel
  Training Dynamics
Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics
Greg Yang
Etai Littwin
70
67
0
08 May 2021
Carbon Emissions and Large Neural Network Training
Carbon Emissions and Large Neural Network Training
David A. Patterson
Joseph E. Gonzalez
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
337
679
0
21 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
579
4,077
0
18 Apr 2021
Adapting Coreference Resolution Models through Active Learning
Adapting Coreference Resolution Models through Active Learning
Michelle Yuan
Patrick Xia
Chandler May
Benjamin Van Durme
Jordan L. Boyd-Graber
41
20
0
15 Apr 2021
Previous
12345
Next