Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.12219
Cited By
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
23 August 2023
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"
17 / 117 papers shown
Title
GLU Variants Improve Transformer
Noam M. Shazeer
126
996
0
12 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
602
4,801
0
23 Jan 2020
Understanding Knowledge Distillation in Non-autoregressive Machine Translation
Chunting Zhou
Graham Neubig
Jiatao Gu
54
221
0
07 Nov 2019
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
212
6,555
0
05 Nov 2019
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
81
654
0
01 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
419
20,127
0
23 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
633
24,431
0
26 Jul 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.7K
94,770
0
11 Oct 2018
PipeDream: Fast and Efficient Pipeline Parallel DNN Training
A. Harlap
Deepak Narayanan
Amar Phanishayee
Vivek Seshadri
Nikhil R. Devanur
G. Ganger
Phillip B. Gibbons
AI4CE
54
254
0
08 Jun 2018
A Call for Clarity in Reporting BLEU Scores
Matt Post
150
2,985
0
23 Apr 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
206
11,549
0
15 Feb 2018
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
97
795
0
07 Nov 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
692
131,526
0
12 Jun 2017
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
SyDa
DiffM
298
6,931
0
12 Mar 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.8K
150,039
0
22 Dec 2014
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
434
20,553
0
10 Sep 2014
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
671
31,489
0
16 Jan 2013
Previous
1
2
3