Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.00027
Cited By
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
31 December 2020
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
Charles Foster
Jason Phang
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Pile: An 800GB Dataset of Diverse Text for Language Modeling"
50 / 402 papers shown
Title
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Ziwei He
Meng-Da Yang
Minwei Feng
Jingcheng Yin
Xinbing Wang
Jingwen Leng
Zhouhan Lin
ViT
35
11
0
24 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
55
130
0
23 May 2023
DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text
Jinyan Su
Terry Yue Zhuo
Di Wang
Preslav Nakov
DeLMO
47
121
0
23 May 2023
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
Aitor Ormazabal
Mikel Artetxe
Eneko Agirre
33
19
0
23 May 2023
InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT
Yichong Xu
Ruochen Xu
Dan Iter
Yang Liu
Shuohang Wang
Chenguang Zhu
Michael Zeng
19
10
0
22 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
90
557
0
22 May 2023
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ariel Ekgren
Amaru Cuba Gyllensten
Felix Stollenwerk
Joey Öhman
T. Isbister
Evangelia Gogoulou
F. Carlsson
Alice Heiman
Judit Casademont
Magnus Sahlgren
27
13
0
22 May 2023
Prompting with Pseudo-Code Instructions
Mayank Mishra
Prince Kumar
Riyaz Ahmad Bhat
V. Rudramurthy
Danish Contractor
Srikanth G. Tamilselvam
45
13
0
19 May 2023
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs
IokTong Lei
Zhidong Deng
ReLM
RALM
LRM
27
4
0
19 May 2023
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts
Qiuhui Chen
Xinyue Hu
Zirui Wang
Yi Hong
LM&MA
MedIm
19
34
0
18 May 2023
Language Models Meet World Models: Embodied Experiences Enhance Language Models
Jiannan Xiang
Tianhua Tao
Yi Gu
Tianmin Shu
Zirui Wang
Zichao Yang
Zhiting Hu
ALM
LLMAG
LM&Ro
CLL
36
94
0
18 May 2023
Statistical Knowledge Assessment for Large Language Models
Qingxiu Dong
Jingjing Xu
Lingpeng Kong
Zhifang Sui
Lei Li
HILM
41
6
0
17 May 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMe
MoE
53
177
0
17 May 2023
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation
Anaelia Ovalle
Palash Goyal
Jwala Dhamala
Zachary Jaggers
Kai-Wei Chang
Aram Galstyan
R. Zemel
Rahul Gupta
25
61
0
17 May 2023
Smaller Language Models are Better Black-box Machine-Generated Text Detectors
Niloofar Mireshghallah
Justus Mattern
Sicun Gao
Reza Shokri
Taylor Berg-Kirkpatrick
DeLMO
24
47
0
17 May 2023
How Good are Commercial Large Language Models on African Languages?
Jessica Ojo
Kelechi Ogueji
26
5
0
11 May 2023
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Dũng Nguyễn Mạnh
Nam Le Hai
An Dau
A. Nguyen
Khanh N. Nghiem
Jingnan Guo
Nghi D. Q. Bui
34
15
0
09 May 2023
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
Zhanpeng Zeng
Cole Hawkins
Min-Fong Hong
Aston Zhang
Nikolaos Pappas
Vikas Singh
Shuai Zheng
21
6
0
07 May 2023
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Erik Nijkamp
A. Ghobadzadeh
Caiming Xiong
Silvio Savarese
Yingbo Zhou
152
164
0
03 May 2023
Multimodal Grounding for Embodied AI via Augmented Reality Headsets for Natural Language Driven Task Planning
Selma Wanna
Fabian Parra
R. Valner
Karl Kruusamäe
Mitch Pryor
LM&Ro
26
2
0
26 Apr 2023
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models
Alex Foote
Neel Nanda
Esben Kran
Ionnis Konstas
Fazl Barez
MILM
28
2
0
22 Apr 2023
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
29
116
0
21 Apr 2023
Scaling Transformer to 1M tokens and beyond with RMT
Aydar Bulatov
Yuri Kuratov
Yermek Kapushev
Andrey Kravchenko
LRM
25
87
0
19 Apr 2023
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
Simran Arora
Brandon Yang
Sabri Eyuboglu
A. Narayan
Andrew Hojel
Immanuel Trummer
Christopher Ré
SyDa
47
69
0
19 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
24
40
0
17 Apr 2023
From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning
Qian Liu
Fan Zhou
Zhengbao Jiang
Longxu Dou
Min-Bin Lin
18
17
0
17 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
18
404
0
13 Apr 2023
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
Emilio Ferrara
SILM
33
247
0
07 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X
Qinkai Zheng
Xiao Xia
Xu Zou
Yuxiao Dong
Shanshan Wang
...
Andi Wang
Yang Li
Teng Su
Zhilin Yang
Jie Tang
ELM
ALM
SyDa
57
317
0
30 Mar 2023
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
76
786
0
30 Mar 2023
Koala: An Index for Quantifying Overlaps with Pre-training Corpora
Thuy-Trang Vu
Xuanli He
Gholamreza Haffari
Ehsan Shareghi
CLL
24
13
0
26 Mar 2023
Scaling Expert Language Models with Unsupervised Domain Discovery
Suchin Gururangan
Margaret Li
M. Lewis
Weijia Shi
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoE
25
46
0
24 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
31
3
0
21 Mar 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
22
193
0
14 Mar 2023
Stylometric Detection of AI-Generated Text in Twitter Timelines
Tharindu Kumarage
Joshua Garland
Amrita Bhattacharjee
K. Trapeznikov
Scott W. Ruston
Huan Liu
DeLMO
25
55
0
07 Mar 2023
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
Christopher Akiki
Odunayo Ogundepo
Aleksandra Piktus
Xinyu Crystina Zhang
Akintunde Oladipo
Jimmy J. Lin
Martin Potthast
25
5
0
28 Feb 2023
On the Generalization Ability of Retrieval-Enhanced Transformers
Tobias Norlund
Ehsan Doostmohammadi
Richard Johansson
Marco Kuhlmann
RALM
27
6
0
23 Feb 2023
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
31
182
0
20 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
16
52
0
13 Feb 2023
Combat AI With AI: Counteract Machine-Generated Fake Restaurant Reviews on Social Media
Alessandro Gambetti
Qiwei Han
DeLMO
21
9
0
10 Feb 2023
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
The Gradient of Generative AI Release: Methods and Considerations
Irene Solaiman
33
98
0
05 Feb 2023
GLADIS: A General and Large Acronym Disambiguation Benchmark
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
ELM
29
3
0
03 Feb 2023
WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics
Jason Wu
Siyan Wang
Siman Shen
Yi-Hao Peng
Jeffrey Nichols
Jeffrey P. Bigham
21
68
0
30 Jan 2023
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi
Sewon Min
Michihiro Yasunaga
Minjoon Seo
Rich James
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
VLM
KELM
62
580
0
30 Jan 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
TrojanPuzzle: Covertly Poisoning Code-Suggestion Models
H. Aghakhani
Wei Dai
Andre Manoel
Xavier Fernandes
Anant Kharkar
Christopher Kruegel
Giovanni Vigna
David E. Evans
B. Zorn
Robert Sim
SILM
23
33
0
06 Jan 2023
Corrupted by Algorithms? How AI-generated and Human-written Advice Shape (Dis)honesty
Margarita Leib
N. Köbis
Rainer Michael Rilke
Marloes H. J. Hagens
Bernd Irlenbusch
13
25
0
05 Jan 2023
Error syntax aware augmentation of feedback comment generation dataset
N. Babakov
M. Lysyuk
Alexander Shvets
Lilya Kazakova
Alexander Panchenko
19
2
0
29 Dec 2022
Previous
1
2
3
4
5
6
7
8
9
Next