ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.11692
  4. Cited By
RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa: A Robustly Optimized BERT Pretraining Approach

26 July 2019
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
    AIMat
ArXivPDFHTML

Papers citing "RoBERTa: A Robustly Optimized BERT Pretraining Approach"

50 / 9,183 papers shown
Title
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Zhangyin Feng
Daya Guo
Duyu Tang
Nan Duan
Xiaocheng Feng
...
Linjun Shou
Bing Qin
Ting Liu
Daxin Jiang
Ming Zhou
70
2,557
0
19 Feb 2020
LAMBERT: Layout-Aware (Language) Modeling for information extraction
LAMBERT: Layout-Aware (Language) Modeling for information extraction
Lukasz Garncarek
Rafal Powalski
Tomasz Stanislawek
Bartosz Topolski
Piotr Halama
M. Turski
Filip Graliñski
15
87
0
19 Feb 2020
From English To Foreign Languages: Transferring Pre-trained Language
  Models
From English To Foreign Languages: Transferring Pre-trained Language Models
Ke M. Tran
30
49
0
18 Feb 2020
A Financial Service Chatbot based on Deep Bidirectional Transformers
A Financial Service Chatbot based on Deep Bidirectional Transformers
S. Yu
Yuxin Chen
Hussain Zaidi
35
33
0
17 Feb 2020
Robustness Verification for Transformers
Robustness Verification for Transformers
Zhouxing Shi
Huan Zhang
Kai-Wei Chang
Minlie Huang
Cho-Jui Hsieh
AAML
27
106
0
16 Feb 2020
Stress Test Evaluation of Transformer-based Models in Natural Language
  Understanding Tasks
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Carlos Aspillaga
Andrés Carvallo
Vladimir Araujo
ELM
47
31
0
14 Feb 2020
FQuAD: French Question Answering Dataset
FQuAD: French Question Answering Dataset
Martin d'Hoffschmidt
Wacim Belblidia
Tom Brendlé
Quentin Heinrich
Maxime Vidal
31
98
0
14 Feb 2020
LaProp: Separating Momentum and Adaptivity in Adam
LaProp: Separating Momentum and Adaptivity in Adam
Liu Ziyin
Zhikang T.Wang
Masahito Ueda
ODL
18
18
0
12 Feb 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
61
954
0
12 Feb 2020
Feature Importance Estimation with Self-Attention Networks
Feature Importance Estimation with Self-Attention Networks
Blaž Škrlj
Jannis Brugger
Nada Lavrac
Matej Petković
FAtt
MILM
34
51
0
11 Feb 2020
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Weihao Yu
Zihang Jiang
Yanfei Dong
Jiashi Feng
LRM
25
246
0
11 Feb 2020
Adversarial Filters of Dataset Biases
Adversarial Filters of Dataset Biases
Ronan Le Bras
Swabha Swayamdipta
Chandra Bhagavatula
Rowan Zellers
Matthew E. Peters
Ashish Sabharwal
Yejin Choi
41
220
0
10 Feb 2020
REALM: Retrieval-Augmented Language Model Pre-Training
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
53
2,026
0
10 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
32
48
0
10 Feb 2020
Pre-training Tasks for Embedding-based Large-scale Retrieval
Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang
Felix X. Yu
Yin-Wen Chang
Yiming Yang
Sanjiv Kumar
RALM
18
302
0
10 Feb 2020
Segmented Graph-Bert for Graph Instance Modeling
Segmented Graph-Bert for Graph Instance Modeling
Jiawei Zhang
SSeg
33
6
0
09 Feb 2020
MA-DST: Multi-Attention Based Scalable Dialog State Tracking
MA-DST: Multi-Attention Based Scalable Dialog State Tracking
Adarsh Kumar
Peter Ku
Anuj Kumar Goyal
A. Metallinou
Dilek Z. Hakkani-Tür
32
58
0
07 Feb 2020
perm2vec: Graph Permutation Selection for Decoding of Error Correction
  Codes using Self-Attention
perm2vec: Graph Permutation Selection for Decoding of Error Correction Codes using Self-Attention
Nir Raviv
Avi Caciularu
Tomer Raviv
Jacob Goldberger
Yair Be’ery
26
8
0
06 Feb 2020
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Ruize Wang
Duyu Tang
Nan Duan
Zhongyu Wei
Xuanjing Huang
Jianshu Ji
Guihong Cao
Daxin Jiang
Ming Zhou
KELM
53
545
0
05 Feb 2020
Beat the AI: Investigating Adversarial Human Annotation for Reading
  Comprehension
Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension
Max Bartolo
A. Roberts
Johannes Welbl
Sebastian Riedel
Pontus Stenetorp
AAML
42
168
0
02 Feb 2020
Are Pre-trained Language Models Aware of Phrases? Simple but Strong
  Baselines for Grammar Induction
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Taeuk Kim
Jihun Choi
Daniel Edmiston
Sang-goo Lee
27
90
0
30 Jan 2020
Retrospective Reader for Machine Reading Comprehension
Retrospective Reader for Machine Reading Comprehension
Zhuosheng Zhang
Junjie Yang
Hai Zhao
RALM
30
226
0
27 Jan 2020
DUMA: Reading Comprehension with Transposition Thinking
DUMA: Reading Comprehension with Transposition Thinking
Pengfei Zhu
Hai Zhao
Xiaoguang Li
AI4CE
39
35
0
26 Jan 2020
Generating Representative Headlines for News Stories
Generating Representative Headlines for News Stories
Xiaotao Gu
Yuning Mao
Jiawei Han
Jialu Liu
Hongkun Yu
You Wu
Cong Yu
Daniel Finnie
Jiaqi Zhai
Nicholas Zukoski
30
70
0
26 Jan 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
144
277
0
24 Jan 2020
Multilingual Denoising Pre-training for Neural Machine Translation
Multilingual Denoising Pre-training for Neural Machine Translation
Yinhan Liu
Jiatao Gu
Naman Goyal
Xian Li
Sergey Edunov
Marjan Ghazvininejad
M. Lewis
Luke Zettlemoyer
AI4CE
AIMat
81
1,780
0
22 Jan 2020
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Darryl Hannan
Akshay Jain
Joey Tianyi Zhou
AAML
38
57
0
22 Jan 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised
  Image-Text Data
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
45
259
0
22 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
261
1,600
0
21 Jan 2020
RobBERT: a Dutch RoBERTa-based Language Model
RobBERT: a Dutch RoBERTa-based Language Model
Pieter Delobelle
Thomas Winters
Bettina Berendt
18
235
0
17 Jan 2020
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark
  for Chinese
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
Liang Xu
Yu Tong
Qianqian Dong
Yixuan Liao
Cong Yu
Yin Tian
Weitang Liu
Lu Li
Caiquan Liu
Xuanwei Zhang
37
48
0
13 Jan 2020
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
  Pre-training
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
AI4TS
32
448
0
13 Jan 2020
LayoutLM: Pre-training of Text and Layout for Document Image
  Understanding
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
71
690
0
31 Dec 2019
oLMpics -- On what Language Model Pre-training Captures
oLMpics -- On what Language Model Pre-training Captures
Alon Talmor
Yanai Elazar
Yoav Goldberg
Jonathan Berant
LRM
39
301
0
31 Dec 2019
Are Transformers universal approximators of sequence-to-sequence
  functions?
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
28
339
0
20 Dec 2019
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language
  Model
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model
Wenhan Xiong
Jingfei Du
William Yang Wang
Veselin Stoyanov
SSL
KELM
52
201
0
20 Dec 2019
BERTje: A Dutch BERT Model
BERTje: A Dutch BERT Model
Wietse de Vries
Andreas van Cranenburgh
Arianna Bisazza
Tommaso Caselli
Gertjan van Noord
Malvina Nissim
VLM
SSeg
34
291
0
19 Dec 2019
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Karthikeyan K
Zihan Wang
Stephen D. Mayhew
Dan Roth
LRM
41
333
0
17 Dec 2019
Multilingual is not enough: BERT for Finnish
Multilingual is not enough: BERT for Finnish
Antti Virtanen
Jenna Kanerva
Rami Ilo
Jouni Luoma
Juhani Luotolahti
T. Salakoski
Filip Ginter
S. Pyysalo
41
278
0
15 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
57
395
0
11 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art
  Baseline
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
32
115
0
05 Dec 2019
Pre-Training of Deep Bidirectional Protein Sequence Representations with
  Structural Information
Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information
Seonwoo Min
Seunghyun Park
Siwon Kim
Hyun-Soo Choi
Byunghan Lee
Sungroh Yoon
SSL
22
62
0
25 Nov 2019
A Transformer-based approach to Irony and Sarcasm detection
A Transformer-based approach to Irony and Sarcasm detection
Rolandos Alexandros Potamias
Georgios Siolas
A. Stafylopatis
33
206
0
23 Nov 2019
Global Greedy Dependency Parsing
Global Greedy Dependency Parsing
Z. Li
Zhao Hai
Kevin Parnow
36
31
0
20 Nov 2019
The Eighth Dialog System Technology Challenge
The Eighth Dialog System Technology Challenge
Seokhwan Kim
Michel Galley
Chulaka Gunasekara
Sungjin Lee
Adam Atkinson
...
Tim K. Marks
Abhinav Rastogi
Xiaoxue Zang
Srinivas Sunkara
Raghav Gupta
VLM
27
65
0
14 Nov 2019
Sato: Contextual Semantic Type Detection in Tables
Sato: Contextual Semantic Type Detection in Tables
Dan Zhang
Yoshihiko Suhara
Jinfeng Li
Madelon Hulsebos
cCaugatay Demiralp
W. Tan
LMTD
24
15
0
14 Nov 2019
What do you mean, BERT? Assessing BERT as a Distributional Semantics
  Model
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Timothee Mickus
Denis Paperno
Mathieu Constant
Kees van Deemter
34
45
0
13 Nov 2019
Adapting and evaluating a deep learning language model for clinical
  why-question answering
Adapting and evaluating a deep learning language model for clinical why-question answering
Andrew Wen
Mohamed Y. Elwazir
Sungrim Moon
Jungwei Fan
LM&MA
24
31
0
13 Nov 2019
Neural Duplicate Question Detection without Labeled Training Data
Neural Duplicate Question Detection without Labeled Training Data
Andreas Rucklé
N. Moosavi
Iryna Gurevych
OOD
AAML
19
11
0
13 Nov 2019
Attending to Entities for Better Text Understanding
Attending to Entities for Better Text Understanding
Pengxiang Cheng
K. Erk
LRM
24
37
0
11 Nov 2019
Previous
123...181182183184
Next