ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.11692
  4. Cited By
RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa: A Robustly Optimized BERT Pretraining Approach

26 July 2019
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
    AIMat
ArXivPDFHTML

Papers citing "RoBERTa: A Robustly Optimized BERT Pretraining Approach"

50 / 4,659 papers shown
Title
Infusing Knowledge into the Textual Entailment Task Using Graph
  Convolutional Networks
Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks
Pavan Kapanipathi
Veronika Thost
S. Patel
Spencer Whitehead
Ibrahim Abdelaziz
...
R. Chulaka Gunasekara
B. Makni
Nicholas Mattei
Kartik Talamadupula
Achille Fokoue
42
45
0
05 Nov 2019
When Choosing Plausible Alternatives, Clever Hans can be Clever
When Choosing Plausible Alternatives, Clever Hans can be Clever
Pride Kavumba
Naoya Inoue
Benjamin Heinzerling
Keshav Singh
Paul Reisert
Kentaro Inui
21
51
0
01 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
71
817
0
01 Nov 2019
Adversarial NLI: A New Benchmark for Natural Language Understanding
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie
Adina Williams
Emily Dinan
Joey Tianyi Zhou
Jason Weston
Douwe Kiela
51
980
0
31 Oct 2019
Transfer Learning from Transformers to Fake News Challenge Stance
  Detection (FNC-1) Task
Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task
Valeriya Slovikovskaya
24
41
0
31 Oct 2019
A neural document language modeling framework for spoken document
  retrieval
A neural document language modeling framework for spoken document retrieval
Li-Phen Yen
Zheng-Yu Wu
Kuan-Yu Chen
3DGS
22
0
0
31 Oct 2019
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question
  Answering
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering
Kaixin Ma
Jonathan M Francis
Quanyang Lu
Eric Nyberg
A. Oltramari
NAI
21
89
0
30 Oct 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
41
10,620
0
29 Oct 2019
SpeechBERT: An Audio-and-text Jointly Learned Language Model for
  End-to-end Spoken Question Answering
SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering
Yung-Sung Chuang
Chi-Liang Liu
Hung-yi Lee
Lin-shan Lee
AuLLM
30
39
0
25 Oct 2019
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
M. Moradshahi
Hamid Palangi
M. Lam
P. Smolensky
Jianfeng Gao
29
16
0
25 Oct 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep
  Bidirectional Transformer Encoders
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
45
372
0
25 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
129
19,529
0
23 Oct 2019
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Yu-An Chung
James R. Glass
SSL
29
173
0
23 Oct 2019
Improving Transformer-based Speech Recognition Using Unsupervised
  Pre-training
Improving Transformer-based Speech Recognition Using Unsupervised Pre-training
Dongwei Jiang
Xiaoning Lei
Wubo Li
Ne Luo
Yuxuan Hu
Wei Zou
Xiangang Li
24
99
0
22 Oct 2019
Findings of the NLP4IF-2019 Shared Task on Fine-Grained Propaganda
  Detection
Findings of the NLP4IF-2019 Shared Task on Fine-Grained Propaganda Detection
Giovanni Da San Martino
Alberto Barrón-Cedeño
Preslav Nakov
22
80
0
20 Oct 2019
Keyphrase Extraction from Scholarly Articles as Sequence Labeling using
  Contextualized Embeddings
Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings
Dhruva Sahrawat
Debanjan Mahata
Mayank Kulkarni
Haimin Zhang
Rakesh Gosangi
Amanda Stent
Agniv Sharma
Yaman Kumar Singla
R. Shah
Roger Zimmermann
14
30
0
19 Oct 2019
A Mutual Information Maximization Perspective of Language Representation
  Learning
A Mutual Information Maximization Perspective of Language Representation Learning
Lingpeng Kong
Cyprien de Masson dÁutume
Wang Ling
Lei Yu
Zihang Dai
Dani Yogatama
SSL
226
166
0
18 Oct 2019
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue
  Response Models
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models
Tianxing He
Jun Liu
Kyunghyun Cho
Myle Ott
Bing-Quan Liu
James R. Glass
Fuchun Peng
CLL
35
9
0
16 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
22
660
0
12 Oct 2019
On Empirical Comparisons of Optimizers for Deep Learning
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
18
256
0
11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
59
70
0
09 Oct 2019
PipeMare: Asynchronous Pipeline Parallel DNN Training
PipeMare: Asynchronous Pipeline Parallel DNN Training
Bowen Yang
Jian Zhang
Jonathan Li
Christopher Ré
Christopher R. Aberger
Christopher De Sa
11
110
0
09 Oct 2019
Knowledge Distillation from Internal Representations
Knowledge Distillation from Internal Representations
Gustavo Aguilar
Yuan Ling
Yu Zhang
Benjamin Yao
Xing Fan
Edward Guo
33
178
0
08 Oct 2019
BERT for Evidence Retrieval and Claim Verification
BERT for Evidence Retrieval and Claim Verification
Shrishti Saha Shetu
Christof Monz
E. Mabande
RALM
23
120
0
07 Oct 2019
Multi-hop Question Answering via Reasoning Chains
Multi-hop Question Answering via Reasoning Chains
Jifan Chen
Shih-Ting Lin
Greg Durrett
ReLM
LRM
19
74
0
07 Oct 2019
SlowMo: Improving Communication-Efficient Distributed SGD with Slow
  Momentum
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Jianyu Wang
Vinayak Tantia
Nicolas Ballas
Michael G. Rabbat
12
200
0
01 Oct 2019
MMM: Multi-stage Multi-task Learning for Multi-choice Reading
  Comprehension
MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
Di Jin
Shuyang Gao
Jiun-Yu Kao
Tagyoung Chung
Dilek Z. Hakkani-Tür
29
69
0
01 Oct 2019
A Simple and Effective Model for Answering Multi-span Questions
A Simple and Effective Model for Answering Multi-span Questions
Elad Segal
Avia Efrat
Mor Shoham
Amir Globerson
Jonathan Berant
KELM
25
30
0
29 Sep 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
112
6,380
0
26 Sep 2019
Mixed Dimension Embeddings with Application to Memory-Efficient
  Recommendation Systems
Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems
Antonio A. Ginart
Maxim Naumov
Dheevatsa Mudigere
Jiyan Yang
James Zou
22
99
0
25 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
43
584
0
25 Sep 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained
  Language Models
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
249
208
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Portuguese Named Entity Recognition using BERT-CRF
Portuguese Named Entity Recognition using BERT-CRF
Fábio Souza
Rodrigo Nogueira
R. Lotufo
22
251
0
23 Sep 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with
  Contextualized Embeddings
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Gregor Wiedemann
Steffen Remus
Avi Chawla
Chris Biemann
27
174
0
23 Sep 2019
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
Eric Wallace
Jens Tuyls
Junlin Wang
Sanjay Subramanian
Matt Gardner
Sameer Singh
MILM
28
137
0
19 Sep 2019
How Additional Knowledge can Improve Natural Language Commonsense
  Question Answering?
How Additional Knowledge can Improve Natural Language Commonsense Question Answering?
Arindam Mitra
Pratyay Banerjee
Kuntal Kumar Pal
Swaroop Mishra
Chitta Baral
KELM
24
31
0
19 Sep 2019
Language models and Automated Essay Scoring
Language models and Automated Essay Scoring
Pedro Uría Rodríguez
Amir Jafari
C. Ormerod
30
82
0
18 Sep 2019
Span-based Joint Entity and Relation Extraction with Transformer
  Pre-training
Span-based Joint Entity and Relation Extraction with Transformer Pre-training
Markus Eberts
A. Ulges
LRM
ViT
164
381
0
17 Sep 2019
K-BERT: Enabling Language Representation with Knowledge Graph
K-BERT: Enabling Language Representation with Knowledge Graph
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Qi Ju
Haotang Deng
Ping Wang
231
777
0
17 Sep 2019
Frustratingly Easy Natural Question Answering
Frustratingly Easy Natural Question Answering
Lin Pan
Rishav Chakravarti
Anthony Ferritto
Michael R. Glass
A. Gliozzo
Salim Roukos
Radu Florian
Avirup Sil
24
14
0
11 Sep 2019
Span Selection Pre-training for Question Answering
Span Selection Pre-training for Question Answering
Michael R. Glass
A. Gliozzo
Rishav Chakravarti
Anthony Ferritto
Lin Pan
G P Shrivatsa Bhargav
Dinesh Garg
Avirup Sil
RALM
38
70
0
09 Sep 2019
Pretrained Language Models for Sequential Sentence Classification
Pretrained Language Models for Sequential Sentence Classification
Arman Cohan
Iz Beltagy
Daniel King
Bhavana Dalvi
Daniel S. Weld
29
128
0
09 Sep 2019
Graph-Based Reasoning over Heterogeneous External Knowledge for
  Commonsense Question Answering
Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
Shangwen Lv
Daya Guo
Jingjing Xu
Duyu Tang
Nan Duan
Ming Gong
Linjun Shou
Daxin Jiang
Guihong Cao
Songlin Hu
RALM
15
202
0
09 Sep 2019
Reasoning Over Semantic-Level Graph for Fact Checking
Reasoning Over Semantic-Level Graph for Fact Checking
Wanjun Zhong
Jingjing Xu
Duyu Tang
Zenan Xu
Nan Duan
M. Zhou
Jiahai Wang
Jian Yin
HILM
GNN
185
166
0
09 Sep 2019
Semantics-aware BERT for Language Understanding
Semantics-aware BERT for Language Understanding
ZhuoSheng Zhang
Yuwei Wu
Zhao Hai
Z. Li
Shuailiang Zhang
Xi Zhou
Xiang Zhou
21
365
0
05 Sep 2019
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
Bill Yuchen Lin
Xinyue Chen
Jamin Chen
Xiang Ren
24
460
0
04 Sep 2019
From 'F' to Á' on the N.Y. Regents Science Exams: An Overview of the
  Aristo Project
From 'F' to Á' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
Peter Clark
Oren Etzioni
Daniel Khashabi
Tushar Khot
Bhavana Dalvi
...
Niket Tandon
Sumithra Bhakthavatsalam
Dirk Groeneveld
Michal Guerquin
Michael Schmitz
ELM
23
99
0
04 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
87
11,768
0
27 Aug 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
78
831
0
25 Aug 2019
Previous
123...929394
Next