ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.11692
  4. Cited By
RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa: A Robustly Optimized BERT Pretraining Approach

26 July 2019
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
    AIMat
ArXivPDFHTML

Papers citing "RoBERTa: A Robustly Optimized BERT Pretraining Approach"

40 / 4,740 papers shown
Title
Multi-hop Question Answering via Reasoning Chains
Multi-hop Question Answering via Reasoning Chains
Jifan Chen
Shih-Ting Lin
Greg Durrett
ReLM
LRM
19
74
0
07 Oct 2019
SlowMo: Improving Communication-Efficient Distributed SGD with Slow
  Momentum
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Jianyu Wang
Vinayak Tantia
Nicolas Ballas
Michael G. Rabbat
17
200
0
01 Oct 2019
MMM: Multi-stage Multi-task Learning for Multi-choice Reading
  Comprehension
MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension
Di Jin
Shuyang Gao
Jiun-Yu Kao
Tagyoung Chung
Dilek Z. Hakkani-Tür
32
69
0
01 Oct 2019
A Simple and Effective Model for Answering Multi-span Questions
A Simple and Effective Model for Answering Multi-span Questions
Elad Segal
Avia Efrat
Mor Shoham
Amir Globerson
Jonathan Berant
KELM
25
30
0
29 Sep 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
115
6,380
0
26 Sep 2019
Mixed Dimension Embeddings with Application to Memory-Efficient
  Recommendation Systems
Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems
Antonio A. Ginart
Maxim Naumov
Dheevatsa Mudigere
Jiyan Yang
James Zou
22
99
0
25 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
43
585
0
25 Sep 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained
  Language Models
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
249
208
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
255
928
0
24 Sep 2019
Portuguese Named Entity Recognition using BERT-CRF
Portuguese Named Entity Recognition using BERT-CRF
Fábio Souza
Rodrigo Nogueira
R. Lotufo
22
251
0
23 Sep 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with
  Contextualized Embeddings
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Gregor Wiedemann
Steffen Remus
Avi Chawla
Chris Biemann
27
174
0
23 Sep 2019
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
Eric Wallace
Jens Tuyls
Junlin Wang
Sanjay Subramanian
Matt Gardner
Sameer Singh
MILM
28
137
0
19 Sep 2019
How Additional Knowledge can Improve Natural Language Commonsense
  Question Answering?
How Additional Knowledge can Improve Natural Language Commonsense Question Answering?
Arindam Mitra
Pratyay Banerjee
Kuntal Kumar Pal
Swaroop Mishra
Chitta Baral
KELM
27
31
0
19 Sep 2019
Language models and Automated Essay Scoring
Language models and Automated Essay Scoring
Pedro Uría Rodríguez
Amir Jafari
C. Ormerod
30
82
0
18 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,836
0
17 Sep 2019
Span-based Joint Entity and Relation Extraction with Transformer
  Pre-training
Span-based Joint Entity and Relation Extraction with Transformer Pre-training
Markus Eberts
A. Ulges
LRM
ViT
164
381
0
17 Sep 2019
K-BERT: Enabling Language Representation with Knowledge Graph
K-BERT: Enabling Language Representation with Knowledge Graph
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Qi Ju
Haotang Deng
Ping Wang
231
778
0
17 Sep 2019
Frustratingly Easy Natural Question Answering
Frustratingly Easy Natural Question Answering
Lin Pan
Rishav Chakravarti
Anthony Ferritto
Michael R. Glass
A. Gliozzo
Salim Roukos
Radu Florian
Avirup Sil
24
14
0
11 Sep 2019
Span Selection Pre-training for Question Answering
Span Selection Pre-training for Question Answering
Michael R. Glass
A. Gliozzo
Rishav Chakravarti
Anthony Ferritto
Lin Pan
G P Shrivatsa Bhargav
Dinesh Garg
Avirup Sil
RALM
38
70
0
09 Sep 2019
Pretrained Language Models for Sequential Sentence Classification
Pretrained Language Models for Sequential Sentence Classification
Arman Cohan
Iz Beltagy
Daniel King
Bhavana Dalvi
Daniel S. Weld
32
128
0
09 Sep 2019
Graph-Based Reasoning over Heterogeneous External Knowledge for
  Commonsense Question Answering
Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
Shangwen Lv
Daya Guo
Jingjing Xu
Duyu Tang
Nan Duan
Ming Gong
Linjun Shou
Daxin Jiang
Guihong Cao
Songlin Hu
RALM
20
202
0
09 Sep 2019
Reasoning Over Semantic-Level Graph for Fact Checking
Reasoning Over Semantic-Level Graph for Fact Checking
Wanjun Zhong
Jingjing Xu
Duyu Tang
Zenan Xu
Nan Duan
M. Zhou
Jiahai Wang
Jian Yin
HILM
GNN
185
166
0
09 Sep 2019
Semantics-aware BERT for Language Understanding
Semantics-aware BERT for Language Understanding
ZhuoSheng Zhang
Yuwei Wu
Zhao Hai
Z. Li
Shuailiang Zhang
Xi Zhou
Xiang Zhou
21
365
0
05 Sep 2019
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
Bill Yuchen Lin
Xinyue Chen
Jamin Chen
Xiang Ren
24
460
0
04 Sep 2019
From 'F' to Á' on the N.Y. Regents Science Exams: An Overview of the
  Aristo Project
From 'F' to Á' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
Peter Clark
Oren Etzioni
Daniel Khashabi
Tushar Khot
Bhavana Dalvi
...
Niket Tandon
Sumithra Bhakthavatsalam
Dirk Groeneveld
Michal Guerquin
Michael Schmitz
ELM
23
99
0
04 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
93
11,789
0
27 Aug 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
78
832
0
25 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
85
1,651
0
22 Aug 2019
Align, Mask and Select: A Simple Method for Incorporating Commonsense
  Knowledge into Language Representation Models
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Zhiquan Ye
Qian Chen
Wen Wang
Zhenhua Ling
27
68
0
19 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
89
895
0
16 Aug 2019
Reasoning Over Paragraph Effects in Situations
Reasoning Over Paragraph Effects in Situations
Kevin Lin
Oyvind Tafjord
Peter Clark
Matt Gardner
15
115
0
16 Aug 2019
A Multi-Turn Emotionally Engaging Dialog Model
A Multi-Turn Emotionally Engaging Dialog Model
Yubo Xie
Ekaterina Svikhnushina
P. Pu
19
15
0
15 Aug 2019
On Identifiability in Transformers
On Identifiability in Transformers
Gino Brunner
Yang Liu
Damian Pascual
Oliver Richter
Massimiliano Ciaramita
Roger Wattenhofer
ViT
30
187
0
12 Aug 2019
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
S. Rothe
Shashi Narayan
Aliaksei Severyn
SILM
73
433
0
29 Jul 2019
SpanBERT: Improving Pre-training by Representing and Predicting Spans
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Mandar Joshi
Danqi Chen
Yinhan Liu
Daniel S. Weld
Luke Zettlemoyer
Omer Levy
33
1,945
0
24 Jul 2019
BERTphone: Phonetically-Aware Encoder Representations for
  Utterance-Level Speaker and Language Recognition
BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition
Shaoshi Ling
Julian Salazar
Yuzong Liu
Katrin Kirchhoff
SSL
30
28
0
30 Jun 2019
Taming Pretrained Transformers for Extreme Multi-label Text
  Classification
Taming Pretrained Transformers for Extreme Multi-label Text Classification
Wei-Cheng Chang
Hsiang-Fu Yu
Kai Zhong
Yiming Yang
Inderjit Dhillon
27
20
0
07 May 2019
Recent Advances in Natural Language Inference: A Survey of Benchmarks,
  Resources, and Approaches
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Shane Storks
Qiaozi Gao
J. Chai
21
128
0
02 Apr 2019
Sentence transition matrix: An efficient approach that preserves
  sentence semantics
Sentence transition matrix: An efficient approach that preserves sentence semantics
Myeongjun Jang
Pilsung Kang
19
2
0
16 Jan 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
304
7,005
0
20 Apr 2018
Previous
123...939495