ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.02038
  4. Cited By
Understanding LLMs: A Comprehensive Overview from Training to Inference
v1v2 (latest)

Understanding LLMs: A Comprehensive Overview from Training to Inference

4 January 2024
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
Jiaming Tian
Yutong Zhang
Jiaqi Wang
Xiaohui Gao
Tianyang Zhong
Yi Pan
Shaochen Xu
Zihao Wu
Zheng Liu
Xin Zhang
Shu Zhang
Xintao Hu
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
    SyDa
ArXiv (abs)PDFHTML

Papers citing "Understanding LLMs: A Comprehensive Overview from Training to Inference"

50 / 157 papers shown
Title
Alignment of Language Agents
Alignment of Language Agents
Zachary Kenton
Tom Everitt
Laura Weidinger
Iason Gabriel
Vladimir Mikulik
G. Irving
80
166
0
26 Mar 2021
FastMoE: A Fast Mixture-of-Expert Training System
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALMMoE
99
102
0
24 Mar 2021
GPT Understands, Too
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
168
1,182
0
18 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
209
2,407
0
05 Mar 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
274
434
0
18 Jan 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li
Percy Liang
252
4,305
0
01 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
476
2,123
0
31 Dec 2020
BinaryBERT: Pushing the Limit of BERT Quantization
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
219
227
0
31 Dec 2020
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
Basel Alomair
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAUSILM
517
1,956
0
14 Dec 2020
How Can We Know When Language Models Know? On the Calibration of
  Language Models for Question Answering
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering
Zhengbao Jiang
Jun Araki
Haibo Ding
Graham Neubig
UQCV
60
437
0
02 Dec 2020
On the Sub-Layer Functionalities of Transformer Decoder
On the Sub-Layer Functionalities of Transformer Decoder
Yilin Yang
Longyue Wang
Shuming Shi
Prasad Tadepalli
Stefan Lee
Zhaopeng Tu
76
27
0
06 Oct 2020
What Disease does this Patient Have? A Large-scale Open Domain Question
  Answering Dataset from Medical Exams
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaMLELMLM&MA
126
811
0
28 Sep 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
170
1,221
0
24 Sep 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot
  Learners
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick
Hinrich Schütze
132
976
0
15 Sep 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
187
4,577
0
07 Sep 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
191
2,993
0
09 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
904
42,463
0
28 May 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
  Cross-lingual Generalization
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
208
977
0
24 Mar 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer
  Learning
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
67
343
0
19 Feb 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
354
1,622
0
21 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
568
42,677
0
03 Dec 2019
How Can We Know What Language Models Know?
How Can We Know What Language Models Know?
Zhengbao Jiang
Frank F. Xu
Jun Araki
Graham Neubig
KELM
149
1,412
0
28 Nov 2019
PIQA: Reasoning about Physical Commonsense in Natural Language
PIQA: Reasoning about Physical Commonsense in Natural Language
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OODLRM
192
1,847
0
26 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMatVLM
266
10,880
0
29 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
506
20,376
0
23 Oct 2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
ALMAI4CE
86
921
0
04 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSLAIMat
373
6,472
0
26 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
113
1,872
0
23 Sep 2019
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELMAI4MH
587
2,680
0
03 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
144
843
0
25 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
700
24,572
0
26 Jul 2019
Energy and Policy Considerations for Deep Learning in NLP
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
83
2,659
0
05 Jun 2019
Defending Against Neural Fake News
Defending Against Neural Fake News
Rowan Zellers
Ari Holtzman
Hannah Rashkin
Yonatan Bisk
Ali Farhadi
Franziska Roesner
Yejin Choi
AAML
137
1,032
0
29 May 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
112
1,069
0
25 May 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
253
1,562
0
24 May 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
188
2,532
0
19 May 2019
Unified Language Model Pre-training for Natural Language Understanding
  and Generation
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu Wang
Jianfeng Gao
M. Zhou
H. Hon
ELMAI4CE
233
1,560
0
08 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
279
2,326
0
02 May 2019
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
365
5,872
0
21 Apr 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
260
3,747
0
09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,324
0
11 Oct 2018
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book
  Question Answering
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
122
1,571
0
08 Sep 2018
A Simple Method for Commonsense Reasoning
A Simple Method for Commonsense Reasoning
Trieu H. Trinh
Quoc V. Le
LRMReLM
102
434
0
07 Jun 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
  Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELMRALMLRM
216
2,672
0
14 Mar 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
235
11,569
0
15 Feb 2018
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
178
1,806
0
10 Oct 2017
Why We Need New Evaluation Metrics for NLG
Why We Need New Evaluation Metrics for NLG
Jekaterina Novikova
Ondrej Dusek
Amanda Cercas Curry
Verena Rieser
114
462
0
21 Jul 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
580
19,315
0
20 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
808
132,725
0
12 Jun 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
316
8,177
0
16 Jun 2016
Previous
1234
Next