ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.11692
  4. Cited By
RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa: A Robustly Optimized BERT Pretraining Approach

26 July 2019
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
    AIMat
ArXivPDFHTML

Papers citing "RoBERTa: A Robustly Optimized BERT Pretraining Approach"

50 / 9,183 papers shown
Title
Improving BERT Fine-tuning with Embedding Normalization
Wenxuan Zhou
Junyi Du
Xiang Ren
21
6
0
10 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition
Effectiveness of self-supervised pre-training for speech recognition
Alexei Baevski
Michael Auli
Abdel-rahman Mohamed
SSL
32
147
0
10 Nov 2019
CamemBERT: a Tasty French Language Model
CamemBERT: a Tasty French Language Model
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
56
960
0
10 Nov 2019
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
47
196
0
09 Nov 2019
E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT
E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT
Nina Poerner
Ulli Waltinger
Hinrich Schütze
47
157
0
09 Nov 2019
Improving Machine Reading Comprehension via Adversarial Training
Improving Machine Reading Comprehension via Adversarial Training
Ziqing Yang
Yiming Cui
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
32
17
0
09 Nov 2019
How Decoding Strategies Affect the Verifiability of Generated Text
How Decoding Strategies Affect the Verifiability of Generated Text
Luca Massarelli
Fabio Petroni
Aleksandra Piktus
Myle Ott
Tim Rocktaschel
Vassilis Plachouras
Fabrizio Silvestri
Sebastian Riedel
33
50
0
09 Nov 2019
Negated and Misprimed Probes for Pretrained Language Models: Birds Can
  Talk, But Cannot Fly
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly
Nora Kassner
Hinrich Schütze
28
316
0
08 Nov 2019
What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
Jaejun Lee
Raphael Tang
Jimmy J. Lin
34
121
0
08 Nov 2019
Certified Data Removal from Machine Learning Models
Certified Data Removal from Machine Learning Models
Chuan Guo
Tom Goldstein
Awni Y. Hannun
Laurens van der Maaten
MU
57
423
0
08 Nov 2019
The TechQA Dataset
The TechQA Dataset
Vittorio Castelli
Rishav Chakravarti
Saswati Dana
Anthony Ferritto
Radu Florian
...
Andrzej Sakrajda
Avirup Sil
Rosario A. Uceda-Sosa
T. Ward
Rong Zhang
31
45
0
08 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Blockwise Self-Attention for Long Document Understanding
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
21
252
0
07 Nov 2019
S2ORC: The Semantic Scholar Open Research Corpus
S2ORC: The Semantic Scholar Open Research Corpus
Kyle Lo
Lucy Lu Wang
Mark Neumann
Rodney Michael Kinney
Daniel S. Weld
OffRL
AI4CE
51
10
0
07 Nov 2019
Infusing Knowledge into the Textual Entailment Task Using Graph
  Convolutional Networks
Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks
Pavan Kapanipathi
Veronika Thost
S. Patel
Spencer Whitehead
Ibrahim Abdelaziz
...
R. Chulaka Gunasekara
B. Makni
Nicholas Mattei
Kartik Talamadupula
Achille Fokoue
47
45
0
05 Nov 2019
When Choosing Plausible Alternatives, Clever Hans can be Clever
When Choosing Plausible Alternatives, Clever Hans can be Clever
Pride Kavumba
Naoya Inoue
Benjamin Heinzerling
Keshav Singh
Paul Reisert
Kentaro Inui
24
51
0
01 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
91
820
0
01 Nov 2019
Adversarial NLI: A New Benchmark for Natural Language Understanding
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie
Adina Williams
Emily Dinan
Joey Tianyi Zhou
Jason Weston
Douwe Kiela
65
982
0
31 Oct 2019
Transfer Learning from Transformers to Fake News Challenge Stance
  Detection (FNC-1) Task
Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task
Valeriya Slovikovskaya
24
41
0
31 Oct 2019
A neural document language modeling framework for spoken document
  retrieval
A neural document language modeling framework for spoken document retrieval
Li-Phen Yen
Zheng-Yu Wu
Kuan-Yu Chen
3DGS
27
0
0
31 Oct 2019
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question
  Answering
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering
Kaixin Ma
Jonathan M Francis
Quanyang Lu
Eric Nyberg
A. Oltramari
NAI
26
89
0
30 Oct 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
41
10,649
0
29 Oct 2019
SpeechBERT: An Audio-and-text Jointly Learned Language Model for
  End-to-end Spoken Question Answering
SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering
Yung-Sung Chuang
Chi-Liang Liu
Hung-yi Lee
Lin-shan Lee
AuLLM
35
39
0
25 Oct 2019
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
M. Moradshahi
Hamid Palangi
M. Lam
P. Smolensky
Jianfeng Gao
37
16
0
25 Oct 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep
  Bidirectional Transformer Encoders
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
65
372
0
25 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
144
19,628
0
23 Oct 2019
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Yu-An Chung
James R. Glass
SSL
38
173
0
23 Oct 2019
Improving Transformer-based Speech Recognition Using Unsupervised
  Pre-training
Improving Transformer-based Speech Recognition Using Unsupervised Pre-training
Dongwei Jiang
Xiaoning Lei
Wubo Li
Ne Luo
Yuxuan Hu
Wei Zou
Xiangang Li
29
99
0
22 Oct 2019
Findings of the NLP4IF-2019 Shared Task on Fine-Grained Propaganda
  Detection
Findings of the NLP4IF-2019 Shared Task on Fine-Grained Propaganda Detection
Giovanni Da San Martino
Alberto Barrón-Cedeño
Preslav Nakov
27
80
0
20 Oct 2019
Keyphrase Extraction from Scholarly Articles as Sequence Labeling using
  Contextualized Embeddings
Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings
Dhruva Sahrawat
Debanjan Mahata
Mayank Kulkarni
Haimin Zhang
Rakesh Gosangi
Amanda Stent
Agniv Sharma
Yaman Kumar Singla
R. Shah
Roger Zimmermann
17
30
0
19 Oct 2019
A Mutual Information Maximization Perspective of Language Representation
  Learning
A Mutual Information Maximization Perspective of Language Representation Learning
Lingpeng Kong
Cyprien de Masson dÁutume
Wang Ling
Lei Yu
Zihang Dai
Dani Yogatama
SSL
226
166
0
18 Oct 2019
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue
  Response Models
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models
Tianxing He
Jun Liu
Kyunghyun Cho
Myle Ott
Bing-Quan Liu
James R. Glass
Fuchun Peng
CLL
45
9
0
16 Oct 2019
Facebook AI's WAT19 Myanmar-English Translation Task Submission
Facebook AI's WAT19 Myanmar-English Translation Task Submission
Peng-Jen Chen
Jiajun Shen
Matt Le
Vishrav Chaudhary
Ahmed El-Kishky
Guillaume Wenzek
Myle Ott
MarcÁurelio Ranzato
25
29
0
15 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
30
661
0
12 Oct 2019
On Empirical Comparisons of Optimizers for Deep Learning
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
46
256
0
11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
59
70
0
09 Oct 2019
PipeMare: Asynchronous Pipeline Parallel DNN Training
PipeMare: Asynchronous Pipeline Parallel DNN Training
Bowen Yang
Jian Zhang
Jonathan Li
Christopher Ré
Christopher R. Aberger
Christopher De Sa
33
110
0
09 Oct 2019
Knowledge Distillation from Internal Representations
Knowledge Distillation from Internal Representations
Gustavo Aguilar
Yuan Ling
Yu Zhang
Benjamin Yao
Xing Fan
Edward Guo
38
179
0
08 Oct 2019
BERT for Evidence Retrieval and Claim Verification
BERT for Evidence Retrieval and Claim Verification
Shrishti Saha Shetu
Christof Monz
E. Mabande
RALM
23
120
0
07 Oct 2019
Multi-hop Question Answering via Reasoning Chains
Multi-hop Question Answering via Reasoning Chains
Jifan Chen
Shih-Ting Lin
Greg Durrett
ReLM
LRM
24
74
0
07 Oct 2019
SlowMo: Improving Communication-Efficient Distributed SGD with Slow
  Momentum
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Jianyu Wang
Vinayak Tantia
Nicolas Ballas
Michael G. Rabbat
30
200
0
01 Oct 2019
MMM: Multi-stage Multi-task Learning for Multi-choice Reading
  Comprehension
MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
Di Jin
Shuyang Gao
Jiun-Yu Kao
Tagyoung Chung
Dilek Z. Hakkani-Tür
37
69
0
01 Oct 2019
A Simple and Effective Model for Answering Multi-span Questions
A Simple and Effective Model for Answering Multi-span Questions
Elad Segal
Avia Efrat
Mor Shoham
Amir Globerson
Jonathan Berant
KELM
33
30
0
29 Sep 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
162
6,397
0
26 Sep 2019
Mixed Dimension Embeddings with Application to Memory-Efficient
  Recommendation Systems
Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems
Antonio A. Ginart
Maxim Naumov
Dheevatsa Mudigere
Jiyan Yang
James Zou
48
100
0
25 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
66
585
0
25 Sep 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained
  Language Models
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
249
208
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
270
929
0
24 Sep 2019
Portuguese Named Entity Recognition using BERT-CRF
Portuguese Named Entity Recognition using BERT-CRF
Fábio Souza
Rodrigo Nogueira
R. Lotufo
22
252
0
23 Sep 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with
  Contextualized Embeddings
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Gregor Wiedemann
Steffen Remus
Avi Chawla
Chris Biemann
43
175
0
23 Sep 2019
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
Eric Wallace
Jens Tuyls
Junlin Wang
Sanjay Subramanian
Matt Gardner
Sameer Singh
MILM
28
137
0
19 Sep 2019
Previous
123...182183184
Next