ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.04805
  4. Cited By
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

11 October 2018
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
    VLM
    SSL
    SSeg
ArXivPDFHTML

Papers citing "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

50 / 18,690 papers shown
Title
Predicting Clinical Diagnosis from Patients Electronic Health Records
  Using BERT-based Neural Networks
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks
Pavel Blinov
Manvel Avetisian
V. Kokh
Dmitry Umerenkov
Alexander Tuzhilin
37
19
0
15 Jul 2020
Emoji Prediction: Extensions and Benchmarking
Emoji Prediction: Extensions and Benchmarking
Weicheng Ma
Ruibo Liu
Lili Wang
Soroush Vosoughi
19
19
0
14 Jul 2020
Deep learning models for representing out-of-vocabulary words
Deep learning models for representing out-of-vocabulary words
Johannes V. Lochter
Renato M. Silva
Tiago A. Almeida
22
15
0
14 Jul 2020
Optimizing Memory Placement using Evolutionary Graph Reinforcement
  Learning
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
Shauharda Khadka
Estelle Aflalo
Mattias Marder
Avrech Ben-David
Santiago Miret
Shie Mannor
Tamir Hazan
Hanlin Tang
Somdeb Majumdar
GNN
32
11
0
14 Jul 2020
CoreGen: Contextualized Code Representation Learning for Commit Message
  Generation
CoreGen: Contextualized Code Representation Learning for Commit Message Generation
L. Nie
Cuiyun Gao
Zhicong Zhong
Wai Lam
Yang Liu
Zenglin Xu
29
46
0
14 Jul 2020
Compare and Reweight: Distinctive Image Captioning Using Similar Images
  Sets
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
37
45
0
14 Jul 2020
An Empirical Study on Robustness to Spurious Correlations using
  Pre-trained Language Models
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
Lifu Tu
Garima Lalwani
Spandana Gella
He He
LRM
33
184
0
14 Jul 2020
Can neural networks acquire a structural bias from raw linguistic data?
Can neural networks acquire a structural bias from raw linguistic data?
Alex Warstadt
Samuel R. Bowman
AI4CE
20
53
0
14 Jul 2020
T-Basis: a Compact Representation for Neural Networks
T-Basis: a Compact Representation for Neural Networks
Anton Obukhov
M. Rakhuba
Stamatios Georgoulis
Menelaos Kanakis
Dengxin Dai
Luc Van Gool
41
27
0
13 Jul 2020
Learning Reasoning Strategies in End-to-End Differentiable Proving
Learning Reasoning Strategies in End-to-End Differentiable Proving
Pasquale Minervini
Sebastian Riedel
Pontus Stenetorp
Edward Grefenstette
Tim Rocktaschel
LRM
45
96
0
13 Jul 2020
Reducing Language Biases in Visual Question Answering with
  Visually-Grounded Question Encoder
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
52
78
0
13 Jul 2020
TERA: Self-Supervised Learning of Transformer Encoder Representation for
  Speech
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
Andy T. Liu
Shang-Wen Li
Hung-yi Lee
SSL
67
356
0
12 Jul 2020
Stance Detection in Web and Social Media: A Comparative Study
Stance Detection in Web and Social Media: A Comparative Study
Shalmoli Ghosh
Prajwal Singhania
Siddharth Singh
Koustav Rudra
Saptarshi Ghosh
11
76
0
12 Jul 2020
Is Machine Learning Speaking my Language? A Critical Look at the
  NLP-Pipeline Across 8 Human Languages
Is Machine Learning Speaking my Language? A Critical Look at the NLP-Pipeline Across 8 Human Languages
Esma Wali
Yan Chen
Christopher Mahoney
Thomas Middleton
M. Babaeianjelodar
Mariama Njie
Jeanna Neefe Matthews
19
9
0
11 Jul 2020
Transformer-XL Based Music Generation with Multiple Sequences of
  Time-valued Notes
Transformer-XL Based Music Generation with Multiple Sequences of Time-valued Notes
Xianchao Wu
Chengyuan Wang
Qinying Lei
22
19
0
11 Jul 2020
Neural Knowledge Extraction From Cloud Service Incidents
Neural Knowledge Extraction From Cloud Service Incidents
Manish Shetty
Chetan Bansal
Sumit Kumar
Nikitha Rao
Nachiappan Nagappan
Thomas Zimmermann
31
17
0
10 Jul 2020
One Policy to Control Them All: Shared Modular Policies for
  Agent-Agnostic Control
One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control
Wenlong Huang
Igor Mordatch
Deepak Pathak
51
167
0
09 Jul 2020
Generalized Few-Shot Video Classification with Video Retrieval and
  Feature Generation
Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation
Yongqin Xian
Bruno Korbar
Matthijs Douze
Lorenzo Torresani
Bernt Schiele
Zeynep Akata
VGen
18
18
0
09 Jul 2020
Learning Speech Representations from Raw Audio by Joint Audiovisual
  Self-Supervision
Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision
Abhinav Shukla
Stavros Petridis
Maja Pantic
SSL
32
16
0
08 Jul 2020
Remix: Rebalanced Mixup
Remix: Rebalanced Mixup
Hsin-Ping Chou
Shih-Chieh Chang
Jia-Yu Pan
Wei Wei
Da-Cheng Juan
41
233
0
08 Jul 2020
Targeting the Benchmark: On Methodology in Current Natural Language
  Processing Research
Targeting the Benchmark: On Methodology in Current Natural Language Processing Research
David Schlangen
33
57
0
07 Jul 2020
Continual BERT: Continual Learning for Adaptive Extractive Summarization
  of COVID-19 Literature
Continual BERT: Continual Learning for Adaptive Extractive Summarization of COVID-19 Literature
Jongjin Park
CLL
33
16
0
07 Jul 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle
  Synchronization for Distributed DNN Training
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training
Weiyan Wang
Cengguang Zhang
Liu Yang
Kai Chen
Kun Tan
34
12
0
07 Jul 2020
Deep Contextual Embeddings for Address Classification in E-commerce
Deep Contextual Embeddings for Address Classification in E-commerce
Shreyas Mangalgi
Lakshya Kumar
Ravindra Babu Tallamraju
25
8
0
06 Jul 2020
DART: Open-Domain Structured Data Record to Text Generation
DART: Open-Domain Structured Data Record to Text Generation
Linyong Nan
Dragomir R. Radev
Rui Zhang
Amrit Rau
Abhinand Sivaprasad
...
Y. Tan
Xi Lin
Caiming Xiong
R. Socher
Nazneen Rajani
17
199
0
06 Jul 2020
Few-shot Relation Extraction via Bayesian Meta-learning on Relation
  Graphs
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs
Meng Qu
Tianyu Gao
Louis-Pascal Xhonneux
Jian Tang
BDL
36
106
0
05 Jul 2020
Robust Prediction of Punctuation and Truecasing for Medical ASR
Robust Prediction of Punctuation and Truecasing for Medical ASR
Monica Sunkara
S. Ronanki
Kalpit Dixit
S. Bodapati
Katrin Kirchhoff
17
33
0
04 Jul 2020
TICO-19: the Translation Initiative for Covid-19
TICO-19: the Translation Initiative for Covid-19
Antonios Anastasopoulos
A. Cattelan
Zi-Yi Dou
Marcello Federico
C. Federman
...
Mengmeng Niu
A. Oktem
Eric Paquin
G. Tang
Sylwia Tur
24
90
0
03 Jul 2020
Generating Informative Dialogue Responses with Keywords-Guided Networks
Generating Informative Dialogue Responses with Keywords-Guided Networks
Heng-Da Xu
Xian-Ling Mao
Zewen Chi
Jing-Jing Zhu
Fanshu Sun
Heyan Huang
BDL
14
5
0
03 Jul 2020
MIRA: Leveraging Multi-Intention Co-click Information in Web-scale
  Document Retrieval using Deep Neural Networks
MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks
Yusi Zhang
Chuanjie Liu
Angen Luo
Hui Xue
Xuan Shan
Y. Luo
Yiqian Xia
Yuanchi Yan
Haidong Wang
13
6
0
03 Jul 2020
Detecting Ongoing Events Using Contextual Word and Sentence Embeddings
Detecting Ongoing Events Using Contextual Word and Sentence Embeddings
Mariano Maisonnave
Fernando Delbianco
F. Tohmé
Ana Gabriela Maguitman
E. Milios
ObjD
19
3
0
02 Jul 2020
Not All Unlabeled Data are Equal: Learning to Weight Data in
  Semi-supervised Learning
Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning
Zhongzheng Ren
Raymond A. Yeh
Alex Schwing
52
95
0
02 Jul 2020
Leveraging Passage Retrieval with Generative Models for Open Domain
  Question Answering
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Gautier Izacard
Edouard Grave
RALM
70
1,119
0
02 Jul 2020
Sequential Domain Adaptation through Elastic Weight Consolidation for
  Sentiment Analysis
Sequential Domain Adaptation through Elastic Weight Consolidation for Sentiment Analysis
Avinash Madasu
Anvesh Rao Vijjini
CLL
12
14
0
02 Jul 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan
Yi Rong
Chen Meng
Zongyan Cao
Siyu Wang
...
Jun Yang
Lixue Xia
Lansong Diao
Xiaoyong Liu
Wei Lin
23
233
0
02 Jul 2020
The Impact of Explanations on AI Competency Prediction in VQA
The Impact of Explanations on AI Competency Prediction in VQA
Kamran Alipour
Arijit Ray
Xiaoyu Lin
J. Schulze
Yi Yao
Giedrius Burachas
30
9
0
02 Jul 2020
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML
  Models: A Survey and Insights
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Shail Dave
Riyadh Baghdadi
Tony Nowatzki
Sasikanth Avancha
Aviral Shrivastava
Baoxin Li
64
82
0
02 Jul 2020
Facts as Experts: Adaptable and Interpretable Neural Memory over
  Symbolic Knowledge
Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge
Pat Verga
Haitian Sun
Livio Baldini Soares
William W. Cohen
KELM
35
50
0
02 Jul 2020
Computing Conceptual Distances between Breast Cancer Screening
  Guidelines: An Implementation of a Near-Peer Epistemic Model of Medical
  Disagreement
Computing Conceptual Distances between Breast Cancer Screening Guidelines: An Implementation of a Near-Peer Epistemic Model of Medical Disagreement
Hossein Hematialam
Luciana D. Garbayo
Seethalakshmi Gopalakrishnan
Wlodek Zadrozny
18
1
0
01 Jul 2020
Measuring Robustness to Natural Distribution Shifts in Image
  Classification
Measuring Robustness to Natural Distribution Shifts in Image Classification
Rohan Taori
Achal Dave
Vaishaal Shankar
Nicholas Carlini
Benjamin Recht
Ludwig Schmidt
OOD
53
537
0
01 Jul 2020
Unbiased Loss Functions for Extreme Classification With Missing Labels
Unbiased Loss Functions for Extreme Classification With Missing Labels
Erik Schultheis
Mohammadreza Qaraei
Priyanshu Gupta
Rohit Babbar
23
6
0
01 Jul 2020
SemEval-2020 Task 4: Commonsense Validation and Explanation
SemEval-2020 Task 4: Commonsense Validation and Explanation
Cunxiang Wang
Shuailong Liang
Yili Jin
Yilong Wang
Xiao-Dan Zhu
Yue Zhang
LRM
25
98
0
01 Jul 2020
Transferability of Natural Language Inference to Biomedical Question
  Answering
Transferability of Natural Language Inference to Biomedical Question Answering
Minbyul Jeong
Mujeen Sung
Gangwoo Kim
Donghyeon Kim
Wonjin Yoon
J. Yoo
Jaewoo Kang
21
38
0
01 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through
  Scene Graph
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
31
377
0
30 Jun 2020
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
Wenquan Wu
Zhen Guo
Zhibin Liu
Xinchao Xu
30
137
0
30 Jun 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhehuai Chen
MoE
43
1,118
0
30 Jun 2020
Classification of cancer pathology reports: a large-scale comparative
  study
Classification of cancer pathology reports: a large-scale comparative study
S. Martina
L. Ventura
P. Frasconi
27
11
0
29 Jun 2020
Multi-Head Attention: Collaborate Instead of Concatenate
Multi-Head Attention: Collaborate Instead of Concatenate
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
6
108
0
29 Jun 2020
Learning Sparse Prototypes for Text Generation
Learning Sparse Prototypes for Text Generation
Junxian He
Taylor Berg-Kirkpatrick
Graham Neubig
27
23
0
29 Jun 2020
Previous
123...339340341...372373374
Next