Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.02414
Cited By
GLM-130B: An Open Bilingual Pre-trained Model
5 October 2022
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
Ming Ding
Zhuoyi Yang
Yifan Xu
Wendi Zheng
Xiao Xia
Weng Lam Tam
Zixuan Ma
Yufei Xue
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GLM-130B: An Open Bilingual Pre-trained Model"
49 / 149 papers shown
Title
RealFormer: Transformer Likes Residual Attention
Ruining He
Anirudh Ravula
Bhargav Kanagal
Joshua Ainslie
33
108
0
21 Dec 2020
Modifying Memories in Transformer Models
Chen Zhu
A. S. Rawat
Manzil Zaheer
Srinadh Bhojanapalli
Daliang Li
Felix X. Yu
Sanjiv Kumar
KELM
55
197
0
01 Dec 2020
Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
Oshin Agarwal
Heming Ge
Siamak Shakeri
Rami Al-Rfou
28
38
0
23 Oct 2020
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Nikita Nangia
Clara Vania
Rasika Bhalerao
Samuel R. Bowman
81
660
0
30 Sep 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
104
1,168
0
24 Sep 2020
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
125
4,222
0
07 Sep 2020
Memory-Efficient Pipeline-Parallel DNN Training
Deepak Narayanan
Amar Phanishayee
Kaiyu Shi
Xie Chen
Matei A. Zaharia
MoE
62
213
0
16 Jun 2020
Self-supervised Learning: Generative or Contrastive
Xiao Liu
Fanjin Zhang
Zhenyu Hou
Zhaoyu Wang
Li Mian
Jing Zhang
Jie Tang
SSL
86
1,604
0
15 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
398
41,106
0
28 May 2020
MLSUM: The Multilingual Summarization Corpus
Thomas Scialom
Paul-Alexis Dray
Sylvain Lamprier
Benjamin Piwowarski
Jacopo Staiano
42
174
0
30 Apr 2020
StereoSet: Measuring stereotypical bias in pretrained language models
Moin Nadeem
Anna Bethke
Siva Reddy
67
979
0
20 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
64
381
0
13 Apr 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
316
1,471
0
18 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
68
1,230
0
25 Feb 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
78
973
0
12 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
53
877
0
10 Feb 2020
Measurement and Fairness
Abigail Z. Jacobs
Hanna M. Wallach
50
383
0
11 Dec 2019
PIQA: Reasoning about Physical Commonsense in Natural Language
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OOD
LRM
62
1,724
0
26 Nov 2019
Semantic Noise Matters for Neural Natural Language Generation
Ondrej Dusek
David M. Howcroft
Verena Rieser
60
118
0
10 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
225
19,824
0
23 Oct 2019
Quantifying the Carbon Emissions of Machine Learning
Alexandre Lacoste
A. Luccioni
Victor Schmidt
Thomas Dandres
64
688
0
21 Oct 2019
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
49
502
0
14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
94
7,386
0
02 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
83
586
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
40
1,838
0
23 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
281
1,861
0
17 Sep 2019
Entity, Relation, and Event Extraction with Contextualized Span Representations
David Wadden
Ulme Wennberg
Yi Luan
Hannaneh Hajishirzi
49
580
0
08 Sep 2019
"Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding
Ben Zhou
Daniel Khashabi
Qiang Ning
Dan Roth
AIMat
48
192
0
06 Sep 2019
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Keisuke Sakaguchi
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
45
211
0
24 Jul 2019
GLTR: Statistical Detection and Visualization of Generated Text
Sebastian Gehrmann
Hendrik Strobelt
Alexander M. Rush
DeLMO
90
529
0
10 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
35
2,633
0
05 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
45
1,049
0
25 May 2019
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu Wang
Jianfeng Gao
M. Zhou
H. Hon
ELM
AI4CE
132
1,553
0
08 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
155
2,287
0
02 May 2019
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
156
4,368
0
02 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
114
3,707
0
09 Jan 2019
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Alon Talmor
Jonathan Herzig
Nicholas Lourie
Jonathan Berant
RALM
112
1,677
0
02 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
819
93,936
0
11 Oct 2018
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
64
1,475
0
08 Sep 2018
Gender Bias in Coreference Resolution
Rachel Rudinger
Jason Naradowsky
Brian Leonard
Benjamin Van Durme
32
636
0
25 Apr 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
59
2,474
0
14 Mar 2018
Generating Wikipedia by Summarizing Long Sequences
Peter J. Liu
Mohammad Saleh
Etienne Pot
Ben Goodrich
Ryan Sepassi
Lukasz Kaiser
Noam M. Shazeer
CVBM
112
786
0
30 Jan 2018
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
116
1,779
0
10 Oct 2017
Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly
Yongqin Xian
Christoph H. Lampert
Bernt Schiele
Zeynep Akata
VLM
116
1,554
0
03 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
331
129,831
0
12 Jun 2017
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
148
2,576
0
09 May 2017
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
209
10,412
0
21 Jul 2016
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
144
4,934
0
27 Jun 2016
The LAMBADA dataset: Word prediction requiring a broad discourse context
Denis Paperno
Germán Kruszewski
Angeliki Lazaridou
Q. N. Pham
Raffaella Bernardi
Sandro Pezzelle
Marco Baroni
Gemma Boleda
Raquel Fernández
62
687
0
20 Jun 2016
Previous
1
2
3