ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.11692
  4. Cited By
RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa: A Robustly Optimized BERT Pretraining Approach

26 July 2019
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
    AIMat
ArXiv (abs)PDFHTML

Papers citing "RoBERTa: A Robustly Optimized BERT Pretraining Approach"

50 / 10,702 papers shown
Title
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural
  Language Processing
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
75
48
0
28 Feb 2020
Few-shot Natural Language Generation for Task-Oriented Dialog
Few-shot Natural Language Generation for Task-Oriented Dialog
Baolin Peng
Chenguang Zhu
Chunyuan Li
Xiujun Li
Jinchao Li
Michael Zeng
Jianfeng Gao
91
201
0
27 Feb 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
125
1,510
0
27 Feb 2020
Learning Representations by Predicting Bags of Visual Words
Learning Representations by Predicting Bags of Visual Words
Spyros Gidaris
Andrei Bursuc
N. Komodakis
P. Pérez
Matthieu Cord
SSL
112
118
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
132
201
0
27 Feb 2020
Disentangling Adaptive Gradient Methods from Learning Rates
Disentangling Adaptive Gradient Methods from Learning Rates
Naman Agarwal
Rohan Anil
Elad Hazan
Tomer Koren
Cyril Zhang
109
38
0
26 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
129
151
0
26 Feb 2020
Multi-task Learning with Multi-head Attention for Multi-choice Reading
  Comprehension
Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension
H. Wan
122
13
0
26 Feb 2020
On Feature Normalization and Data Augmentation
On Feature Normalization and Data Augmentation
Boyi Li
Felix Wu
Ser-Nam Lim
Serge J. Belongie
Kilian Q. Weinberger
56
137
0
25 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
218
1,285
0
25 Feb 2020
Low-Resource Knowledge-Grounded Dialogue Generation
Low-Resource Knowledge-Grounded Dialogue Generation
Xueliang Zhao
Wei Wu
Chongyang Tao
Can Xu
Dongyan Zhao
Rui Yan
117
110
0
24 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
83
67
0
24 Feb 2020
Training Question Answering Models From Synthetic Data
Training Question Answering Models From Synthetic Data
Raul Puri
Ryan Spring
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ELM
81
160
0
22 Feb 2020
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays
  in Distributed SGD
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD
Jianyu Wang
Hao Liang
Gauri Joshi
60
33
0
21 Feb 2020
VQA-LOL: Visual Question Answering under the Lens of Logic
VQA-LOL: Visual Question Answering under the Lens of Logic
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
CoGe
71
75
0
19 Feb 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer
  Learning
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
78
343
0
19 Feb 2020
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Zhangyin Feng
Daya Guo
Duyu Tang
Nan Duan
Xiaocheng Feng
...
Linjun Shou
Bing Qin
Ting Liu
Daxin Jiang
Ming Zhou
194
2,714
0
19 Feb 2020
LAMBERT: Layout-Aware (Language) Modeling for information extraction
LAMBERT: Layout-Aware (Language) Modeling for information extraction
Lukasz Garncarek
Rafal Powalski
Tomasz Stanislawek
Bartosz Topolski
Piotr Halama
M. Turski
Filip Graliñski
81
88
0
19 Feb 2020
Attacking Neural Text Detectors
Attacking Neural Text Detectors
Max Wolff
Stuart Wolff
AAMLDeLMO
68
50
0
19 Feb 2020
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural
  Language Understanding
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu
Yu Wang
Jianshu Ji
Hao Cheng
Xueyun Zhu
...
Pengcheng He
Weizhu Chen
Hoifung Poon
Guihong Cao
Jianfeng Gao
AI4CE
77
61
0
19 Feb 2020
Learning by Semantic Similarity Makes Abstractive Summarization Better
Learning by Semantic Similarity Makes Abstractive Summarization Better
Wonjin Yoon
Yoonsun Yeo
Minbyul Jeong
Bong-Jun Yi
Jaewoo Kang
150
16
0
18 Feb 2020
From English To Foreign Languages: Transferring Pre-trained Language
  Models
From English To Foreign Languages: Transferring Pre-trained Language Models
Ke M. Tran
48
52
0
18 Feb 2020
A Financial Service Chatbot based on Deep Bidirectional Transformers
A Financial Service Chatbot based on Deep Bidirectional Transformers
S. Yu
Yuxin Chen
Hussain Zaidi
73
35
0
17 Feb 2020
Incorporating BERT into Neural Machine Translation
Incorporating BERT into Neural Machine Translation
Jinhua Zhu
Yingce Xia
Lijun Wu
Di He
Tao Qin
Wen-gang Zhou
Houqiang Li
Tie-Yan Liu
FedMLAIMat
50
360
0
17 Feb 2020
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word
  Models
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models
Bin Wang
C.-C. Jay Kuo
50
156
0
16 Feb 2020
Towards Detection of Subjective Bias using Contextualized Word
  Embeddings
Towards Detection of Subjective Bias using Contextualized Word Embeddings
Tanvi Dadu
Kartikey Pant
R. Mamidi
42
22
0
16 Feb 2020
Robustness Verification for Transformers
Robustness Verification for Transformers
Zhouxing Shi
Huan Zhang
Kai-Wei Chang
Minlie Huang
Cho-Jui Hsieh
AAML
81
109
0
16 Feb 2020
Undersensitivity in Neural Reading Comprehension
Undersensitivity in Neural Reading Comprehension
Johannes Welbl
Pasquale Minervini
Max Bartolo
Pontus Stenetorp
Sebastian Riedel
AAML
62
18
0
15 Feb 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal
  Understanding and Generation
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Botian Shi
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
124
438
0
15 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data
  Orders, and Early Stopping
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
103
598
0
15 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
  Efficient Retrieval
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
58
50
0
14 Feb 2020
Stress Test Evaluation of Transformer-based Models in Natural Language
  Understanding Tasks
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Carlos Aspillaga
Andrés Carvallo
Vladimir Araujo
ELM
69
31
0
14 Feb 2020
Transformer on a Diet
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng Zhang
Alex Smola
80
8
0
14 Feb 2020
FQuAD: French Question Answering Dataset
FQuAD: French Question Answering Dataset
Martin d'Hoffschmidt
Wacim Belblidia
Tom Brendlé
Quentin Heinrich
Maxime Vidal
115
100
0
14 Feb 2020
Transformers as Soft Reasoners over Language
Transformers as Soft Reasoners over Language
Peter Clark
Oyvind Tafjord
Kyle Richardson
ReLMOffRLLRM
121
362
0
14 Feb 2020
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural
  Language Processing
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing
Xiyou Zhou
Zhiyu Zoey Chen
Xiaoyong Jin
Wenjie Wang
78
34
0
14 Feb 2020
Sentiment Analysis Using Averaged Weighted Word Vector Features
Sentiment Analysis Using Averaged Weighted Word Vector Features
Ali Erkan
Tunga Güngör
16
3
0
13 Feb 2020
LaProp: Separating Momentum and Adaptivity in Adam
LaProp: Separating Momentum and Adaptivity in Adam
Liu Ziyin
Zhikang T.Wang
Masahito Ueda
ODL
70
18
0
12 Feb 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
160
1,001
0
12 Feb 2020
Feature Importance Estimation with Self-Attention Networks
Feature Importance Estimation with Self-Attention Networks
Blaž Škrlj
Jannis Brugger
Nada Lavrac
Matej Petković
FAttMILM
88
52
0
11 Feb 2020
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Weihao Yu
Zihang Jiang
Yanfei Dong
Jiashi Feng
LRM
160
255
0
11 Feb 2020
Adversarial Filters of Dataset Biases
Adversarial Filters of Dataset Biases
Ronan Le Bras
Swabha Swayamdipta
Chandra Bhagavatula
Rowan Zellers
Matthew E. Peters
Ashish Sabharwal
Yejin Choi
147
223
0
10 Feb 2020
Exploring Chemical Space using Natural Language Processing Methodologies
  for Drug Discovery
Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery
Hakime Öztürk
Arzucan Özgür
P. Schwaller
Teodoro Laino
Elif Özkirimli
98
122
0
10 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
144
898
0
10 Feb 2020
REALM: Retrieval-Augmented Language Model Pre-Training
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
149
2,126
0
10 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
82
52
0
10 Feb 2020
Pre-training Tasks for Embedding-based Large-scale Retrieval
Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang
Felix X. Yu
Yin-Wen Chang
Yiming Yang
Sanjiv Kumar
RALM
102
306
0
10 Feb 2020
Segmented Graph-Bert for Graph Instance Modeling
Segmented Graph-Bert for Graph Instance Modeling
Jiawei Zhang
SSeg
60
6
0
09 Feb 2020
Blank Language Models
Blank Language Models
T. Shen
Victor Quach
Regina Barzilay
Tommi Jaakkola
288
73
0
08 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
343
201
0
07 Feb 2020
Previous
123...209210211...213214215
Next