ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.04805
  4. Cited By
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
v1v2 (latest)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

11 October 2018
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
    VLMSSLSSeg
ArXiv (abs)PDFHTML

Papers citing "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

50 / 23,639 papers shown
Title
A Semantic-based Method for Unsupervised Commonsense Question Answering
A Semantic-based Method for Unsupervised Commonsense Question Answering
Yilin Niu
Fei Huang
Jiaming Liang
Wenkai Chen
Xiaoyan Zhu
Minlie Huang
LRM
84
13
0
31 May 2021
Sketch and Refine: Towards Faithful and Informative Table-to-Text
  Generation
Sketch and Refine: Towards Faithful and Informative Table-to-Text Generation
Peng Wang
Junyang Lin
An Yang
Chang Zhou
Yichang Zhang
Jingren Zhou
Hongxia Yang
71
21
0
31 May 2021
Dual-stream Network for Visual Recognition
Dual-stream Network for Visual Recognition
Mingyuan Mao
Renrui Zhang
Honghui Zheng
Peng Gao
Teli Ma
Yan Peng
Errui Ding
Baochang Zhang
Shumin Han
ViT
78
66
0
31 May 2021
Zero-shot Fact Verification by Claim Generation
Zero-shot Fact Verification by Claim Generation
Liangming Pan
Wenhu Chen
Wenhan Xiong
Min-Yen Kan
Wenjie Wang
85
59
0
31 May 2021
On the Interplay Between Fine-tuning and Composition in Transformers
On the Interplay Between Fine-tuning and Composition in Transformers
Lang-Chi Yu
Allyson Ettinger
79
14
0
31 May 2021
LEAP: Learnable Pruning for Transformer-based Models
LEAP: Learnable Pruning for Transformer-based Models
Z. Yao
Xiaoxia Wu
Linjian Ma
Sheng Shen
Kurt Keutzer
Michael W. Mahoney
Yuxiong He
71
7
0
30 May 2021
HIT: A Hierarchically Fused Deep Attention Network for Robust Code-mixed
  Language Representation
HIT: A Hierarchically Fused Deep Attention Network for Robust Code-mixed Language Representation
Ayan Sengupta
S. Bhattacharjee
Tanmoy Chakraborty
Md. Shad Akhtar
50
14
0
30 May 2021
A Compression-Compilation Framework for On-mobile Real-time BERT
  Applications
A Compression-Compilation Framework for On-mobile Real-time BERT Applications
Wei Niu
Zhenglun Kong
Geng Yuan
Weiwen Jiang
Jiexiong Guan
Caiwen Ding
Pu Zhao
Sijia Liu
Bin Ren
Yanzhi Wang
MQ
37
4
0
30 May 2021
StyTr$^2$: Image Style Transfer with Transformers
StyTr2^22: Image Style Transfer with Transformers
Yingying Deng
Fan Tang
Weiming Dong
Chongyang Ma
Xingjia Pan
Lei Wang
Changsheng Xu
ViT
123
269
0
30 May 2021
Diversifying Dialog Generation via Adaptive Label Smoothing
Diversifying Dialog Generation via Adaptive Label Smoothing
Yida Wang
Yinhe Zheng
Yong Jiang
Minlie Huang
92
37
0
30 May 2021
Defending Pre-trained Language Models from Adversarial Word
  Substitutions Without Performance Sacrifice
Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice
Rongzhou Bao
Jiayi Wang
Hai Zhao
AAML
56
43
0
30 May 2021
Fast Nearest Neighbor Machine Translation
Fast Nearest Neighbor Machine Translation
Yuxian Meng
Xiaoya Li
Xiayu Zheng
Leilei Gan
Xiaofei Sun
Tianwei Zhang
Jiwei Li
LRM
85
49
0
30 May 2021
Neural Models for Offensive Language Detection
Neural Models for Offensive Language Detection
Ehab Hamdy
34
4
0
30 May 2021
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal
  Numerical Reasoning
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
Jiaqi Chen
Jianheng Tang
Jinghui Qin
Xiaodan Liang
Lingbo Liu
Eric Xing
Liang Lin
AIMat
121
188
0
30 May 2021
Structured Sentiment Analysis as Dependency Graph Parsing
Structured Sentiment Analysis as Dependency Graph Parsing
Jeremy Barnes
Robin Kurtz
Stephan Oepen
Lilja Ovrelid
Erik Velldal
78
74
0
30 May 2021
Tesseract: Parallelize the Tensor Parallelism Efficiently
Tesseract: Parallelize the Tensor Parallelism Efficiently
Boxiang Wang
Qifan Xu
Zhengda Bian
Yang You
VLMGNN
44
35
0
30 May 2021
CLEVE: Contrastive Pre-training for Event Extraction
CLEVE: Contrastive Pre-training for Event Extraction
Ziqi Wang
Xiaozhi Wang
Xu Han
Yankai Lin
Lei Hou
Zhiyuan Liu
Peng Li
Juan-Zi Li
Jie Zhou
84
118
0
30 May 2021
Pre-training Universal Language Representation
Pre-training Universal Language Representation
Yian Li
Hai Zhao
SSL
62
8
0
30 May 2021
Good for Misconceived Reasons: An Empirical Revisiting on the Need for
  Visual Context in Multimodal Machine Translation
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation
Zhiyong Wu
Lingpeng Kong
W. Bi
Xiang Li
B. Kao
LRM
74
81
0
30 May 2021
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based
  Simulation
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation
Sungdong Kim
Minsuk Chang
Sang-Woo Lee
117
19
0
30 May 2021
Maximizing Parallelism in Distributed Training for Huge Neural Networks
Maximizing Parallelism in Distributed Training for Huge Neural Networks
Zhengda Bian
Qifan Xu
Boxiang Wang
Yang You
MoE
63
48
0
30 May 2021
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural
  Architecture Search
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search
Jin Xu
Xu Tan
Renqian Luo
Kaitao Song
Jian Li
Tao Qin
Tie-Yan Liu
MQ
62
79
0
30 May 2021
Gaze Estimation using Transformer
Gaze Estimation using Transformer
Yihua Cheng
Feng Lu
ViT
80
94
0
30 May 2021
Re-evaluating Word Mover's Distance
Re-evaluating Word Mover's Distance
Ryoma Sato
M. Yamada
H. Kashima
119
24
0
30 May 2021
Learning Domain-Specialised Representations for Cross-Lingual Biomedical
  Entity Linking
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking
Fangyu Liu
Ivan Vulić
Anna Korhonen
Nigel Collier
86
49
0
30 May 2021
Sentiment analysis in tweets: an assessment study from classical to
  modern text representation models
Sentiment analysis in tweets: an assessment study from classical to modern text representation models
Sérgio Barreto
Ricardo Moura
Jonnathan Carvalho
A. Paes
A. Plastino
84
14
0
29 May 2021
Modeling Discriminative Representations for Out-of-Domain Detection with
  Supervised Contrastive Learning
Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning
Zhiyuan Zeng
Keqing He
Yuanmeng Yan
Zijun Liu
Yanan Wu
Hong Xu
Huixing Jiang
Weiran Xu
63
68
0
29 May 2021
Grammar Accuracy Evaluation (GAE): Quantifiable Quantitative Evaluation
  of Machine Translation Models
Grammar Accuracy Evaluation (GAE): Quantifiable Quantitative Evaluation of Machine Translation Models
Dojun Park
Youngjin Jang
Harksoo Kim
ELM
54
1
0
29 May 2021
CommitBERT: Commit Message Generation Using Pre-Trained Programming
  Language Model
CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model
Tae-Hwan Jung
VLM
61
31
0
29 May 2021
Quotation Recommendation and Interpretation Based on Transformation from
  Queries to Quotations
Quotation Recommendation and Interpretation Based on Transformation from Queries to Quotations
Lingzhi Wang
Xingshan Zeng
Kam-Fai Wong
126
8
0
29 May 2021
A Novel Framework Integrating AI Model and Enzymological Experiments
  Promotes Identification of SARS-CoV-2 3CL Protease Inhibitors and
  Activity-based Probe
A Novel Framework Integrating AI Model and Enzymological Experiments Promotes Identification of SARS-CoV-2 3CL Protease Inhibitors and Activity-based Probe
F. Hu
Lei Wang
Yishen Hu
Dongqi Wang
Weijie Wang
Jiaqiang Jiang
Nan Li
P. Yin
27
12
0
29 May 2021
CoDesc: A Large Code-Description Parallel Dataset
CoDesc: A Large Code-Description Parallel Dataset
Masum Hasan
Tanveer Muttaqueen
Abdullah Al Ishtiaq
Kazi Sajeed Mehrab
Md. Mahim Anjum Haque
Tahmid Hasan
Wasi Uddin Ahmad
Anindya Iqbal
Rifat Shahriyar
74
32
0
29 May 2021
Less is More: Pay Less Attention in Vision Transformers
Less is More: Pay Less Attention in Vision Transformers
Zizheng Pan
Bohan Zhuang
Haoyu He
Jing Liu
Jianfei Cai
ViT
141
87
0
29 May 2021
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
  via Non-Autoregressive Generative Transformers
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers
Zhu Zhang
Jianxin Ma
Chang Zhou
Rui Men
Zhikang Li
Ming Ding
Jie Tang
Jingren Zhou
Hongxia Yang
105
47
0
29 May 2021
Exploiting Position Bias for Robust Aspect Sentiment Classification
Exploiting Position Bias for Robust Aspect Sentiment Classification
Fangjie Ma
Chen Zhang
D. Song
52
18
0
29 May 2021
Multi-Label Few-Shot Learning for Aspect Category Detection
Multi-Label Few-Shot Learning for Aspect Category Detection
Mengting Hu
Shiwan Zhao
Honglei Guo
Chao Xue
H. Gao
Tiegang Gao
Renhong Cheng
Zhong Su
67
40
0
29 May 2021
NeuralLog: Natural Language Inference with Joint Neural and Logical
  Reasoning
NeuralLog: Natural Language Inference with Joint Neural and Logical Reasoning
Zeming Chen
Qiyue Gao
Lawrence S. Moss
FedMLNAI
86
42
0
29 May 2021
A Query-Driven Topic Model
A Query-Driven Topic Model
Zheng Fang
Yulan He
Rob Procter
55
10
0
28 May 2021
Towards More Equitable Question Answering Systems: How Much More Data Do
  You Need?
Towards More Equitable Question Answering Systems: How Much More Data Do You Need?
Arnab Debnath
Navid Rajabi
F. Alam
Antonios Anastasopoulos
69
11
0
28 May 2021
An Attention Free Transformer
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
94
132
0
28 May 2021
Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot
  Meta-Learning
Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning
Nan Ding
Xi Chen
Tomer Levinboim
Sebastian Goodman
Radu Soricut
77
33
0
28 May 2021
Weighted Training for Cross-Task Learning
Weighted Training for Cross-Task Learning
Shuxiao Chen
K. Crammer
Han He
Dan Roth
Weijie J. Su
87
28
0
28 May 2021
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging
Xiaotao Gu
Zihan Wang
Zhenyu Bi
Yu Meng
Liyuan Liu
Jiawei Han
Jingbo Shang
184
36
0
28 May 2021
On the Bias Against Inductive Biases
On the Bias Against Inductive Biases
George Cazenavette
Simon Lucey
SSL
37
2
0
28 May 2021
Controllable Abstractive Dialogue Summarization with Sketch Supervision
Controllable Abstractive Dialogue Summarization with Sketch Supervision
Chien-Sheng Wu
Linqing Liu
Wenhao Liu
Pontus Stenetorp
Caiming Xiong
83
52
0
28 May 2021
Learning to Extend Program Graphs to Work-in-Progress Code
Learning to Extend Program Graphs to Work-in-Progress Code
Xuechen Li
Chris J. Maddison
Daniel Tarlow
52
2
0
28 May 2021
Online Hate: Behavioural Dynamics and Relationship with Misinformation
Online Hate: Behavioural Dynamics and Relationship with Misinformation
Matteo Cinelli
Andraz Pelicon
I. Mozetič
Walter Quattrociocchi
Petra Kralj Novak
Fabiana Zollo
51
8
0
28 May 2021
What if This Modified That? Syntactic Interventions via Counterfactual
  Embeddings
What if This Modified That? Syntactic Interventions via Counterfactual Embeddings
Mycal Tucker
Peng Qian
R. Levy
81
40
0
28 May 2021
SemEval-2021 Task 9: Fact Verification and Evidence Finding for Tabular
  Data in Scientific Documents (SEM-TAB-FACTS)
SemEval-2021 Task 9: Fact Verification and Evidence Finding for Tabular Data in Scientific Documents (SEM-TAB-FACTS)
N. Wang
Diwakar Mahajan
Marina Danilevsky
Sara Rosenthal
LMTD
93
56
0
28 May 2021
Linguistic Structures as Weak Supervision for Visual Scene Graph
  Generation
Linguistic Structures as Weak Supervision for Visual Scene Graph Generation
Keren Ye
Adriana Kovashka
69
54
0
28 May 2021
Previous
123...332333334...471472473
Next