ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.05620
  4. Cited By
Visualizing and Understanding the Effectiveness of BERT

Visualizing and Understanding the Effectiveness of BERT

15 August 2019
Y. Hao
Li Dong
Furu Wei
Ke Xu
ArXiv (abs)PDFHTML

Papers citing "Visualizing and Understanding the Effectiveness of BERT"

50 / 59 papers shown
Title
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
Adrian Chan
Anupam Mijar
Mehreen Saeed
Chau-Wai Wong
Akram Khater
447
0
0
03 Oct 2024
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Penglei Sun
Yaoxian Song
Xinglin Pan
Peijie Dong
Xiaofei Yang
Qiang-qiang Wang
Zhixu Li
Tiefeng Li
Xiaowen Chu
125
1
0
03 Jul 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Yichu Xu
Xin-Chun Li
Lan Li
De-Chuan Zhan
93
2
0
21 May 2024
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
KL Regularized Normalization Framework for Low Resource Tasks
KL Regularized Normalization Framework for Low Resource Tasks
Neeraj Kumar
Ankur Narang
Brejesh Lall
60
1
0
21 Dec 2022
Improving Generalization of Pre-trained Language Models via Stochastic
  Weight Averaging
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
104
11
0
12 Dec 2022
Exploring Mode Connectivity for Pre-trained Language Models
Exploring Mode Connectivity for Pre-trained Language Models
Yujia Qin
Cheng Qian
Jing Yi
Weize Chen
Yankai Lin
Xu Han
Zhiyuan Liu
Maosong Sun
Jie Zhou
99
21
0
25 Oct 2022
Improving Sharpness-Aware Minimization with Fisher Mask for Better
  Generalization on Language Models
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Qihuang Zhong
Liang Ding
Li Shen
Peng Mi
Juhua Liu
Bo Du
Dacheng Tao
AAML
93
51
0
11 Oct 2022
Visualizing high-dimensional loss landscapes with Hessian directions
Visualizing high-dimensional loss landscapes with Hessian directions
Lucas Böttcher
Gregory R. Wheeler
79
14
0
28 Aug 2022
Near-optimal control of dynamical systems with neural ordinary
  differential equations
Near-optimal control of dynamical systems with neural ordinary differential equations
Lucas Böttcher
Thomas Asikis
AI4CE
67
19
0
22 Jun 2022
Perspectives of Non-Expert Users on Cyber Security and Privacy: An
  Analysis of Online Discussions on Twitter
Perspectives of Non-Expert Users on Cyber Security and Privacy: An Analysis of Online Discussions on Twitter
Nandita Pattnaik
Shujun Li
Jason R. C. Nurse
59
24
0
05 Jun 2022
Joint Training of Speech Enhancement and Self-supervised Model for
  Noise-robust ASR
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-shi Zhu
Jie Zhang
Zitian Zhang
Lirong Dai
90
15
0
26 May 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More
  Compressible Models
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
116
20
0
25 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to
  Store Speaker Information
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
86
8
0
08 May 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker
  Recognition?
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Sanyuan Chen
Yu Wu
Chengyi Wang
Shujie Liu
Zhuo Chen
...
Gang Liu
Jinyu Li
Jian Wu
Xiangzhan Yu
Furu Wei
SSL
95
42
0
27 Apr 2022
simCrossTrans: A Simple Cross-Modality Transfer Learning for Object
  Detection with ConvNets or Vision Transformers
simCrossTrans: A Simple Cross-Modality Transfer Learning for Object Detection with ConvNets or Vision Transformers
Xiaoke Shen
I. Stamos
ViT
36
5
0
20 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoEAI4CE
136
162
0
01 Mar 2022
A Survey of Pretraining on Graphs: Taxonomy, Methods, and Applications
A Survey of Pretraining on Graphs: Taxonomy, Methods, and Applications
Jun Xia
Yanqiao Zhu
Yuanqi Du
Stan Z. Li
VLM
81
42
0
16 Feb 2022
An Empirical Investigation of the Role of Pre-training in Lifelong
  Learning
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta
Darshan Patil
Sarath Chandar
Emma Strubell
CLL
151
145
0
16 Dec 2021
Interpreting Language Models Through Knowledge Graph Extraction
Interpreting Language Models Through Knowledge Graph Extraction
Vinitra Swamy
Angelika Romanou
Martin Jaggi
65
20
0
16 Nov 2021
Interpreting Deep Learning Models in Natural Language Processing: A
  Review
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
99
47
0
20 Oct 2021
How Does Adversarial Fine-Tuning Benefit BERT?
How Does Adversarial Fine-Tuning Benefit BERT?
J. Ebrahimi
Hao Yang
Wei Zhang
AAML
53
4
0
31 Aug 2021
T3-Vis: a visual analytic framework for Training and fine-Tuning
  Transformers in NLP
T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP
Raymond Li
Wen Xiao
Lanjun Wang
Hyeju Jang
Giuseppe Carenini
ViT
90
23
0
31 Aug 2021
What can linear interpolation of neural network loss landscapes tell us?
What can linear interpolation of neural network loss landscapes tell us?
Tiffany J. Vlaar
Jonathan Frankle
MoMe
76
28
0
30 Jun 2021
On the Effectiveness of Adapter-based Tuning for Pretrained Language
  Model Adaptation
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation
Ruidan He
Linlin Liu
Hai Ye
Qingyu Tan
Bosheng Ding
Liying Cheng
Jia-Wei Low
Lidong Bing
Luo Si
65
204
0
06 Jun 2021
Inspecting the concept knowledge graph encoded by modern language models
Inspecting the concept knowledge graph encoded by modern language models
Carlos Aspillaga
Marcelo Mendoza
Alvaro Soto
72
13
0
27 May 2021
Probing Across Time: What Does RoBERTa Know and When?
Probing Across Time: What Does RoBERTa Know and When?
Leo Z. Liu
Yizhong Wang
Jungo Kasai
Hannaneh Hajishirzi
Noah A. Smith
KELM
114
88
0
16 Apr 2021
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text
  Retrieval
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
Song Liu
Haoqi Fan
Shengsheng Qian
Yiru Chen
Wenkui Ding
Zhongyuan Wang
113
147
0
28 Mar 2021
The heads hypothesis: A unifying statistical approach towards
  understanding multi-headed attention in BERT
The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
Madhura Pande
Aakriti Budhraja
Preksha Nema
Pratyush Kumar
Mitesh M. Khapra
68
19
0
22 Jan 2021
BinaryBERT: Pushing the Limit of BERT Quantization
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
230
227
0
31 Dec 2020
XLM-T: Scaling up Multilingual Machine Translation with Pretrained
  Cross-lingual Transformer Encoders
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Shuming Ma
Jian Yang
Haoyang Huang
Zewen Chi
Li Dong
...
Akiko Eriguchi
Saksham Singhal
Xia Song
Arul Menezes
Furu Wei
LRM
85
33
0
31 Dec 2020
Intrinsic Dimensionality Explains the Effectiveness of Language Model
  Fine-Tuning
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan
Luke Zettlemoyer
Sonal Gupta
110
577
1
22 Dec 2020
Fine-tuning BERT for Low-Resource Natural Language Understanding via
  Active Learning
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
Daniel Grießhaber
J. Maucher
Ngoc Thang Vu
85
46
0
04 Dec 2020
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
  Natural Language Understanding
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding
Yanru Qu
Dinghan Shen
Yelong Shen
Sandra Sajeev
Jiawei Han
Weizhu Chen
204
69
0
16 Oct 2020
Neural Databases
Neural Databases
James Thorne
Majid Yazdani
Marzieh Saeidi
Fabrizio Silvestri
Sebastian Riedel
A. Halevy
NAI
99
9
0
14 Oct 2020
Feature Adaptation of Pre-Trained Language Models across Languages and
  Domains with Robust Self-Training
Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training
Hai Ye
Qingyu Tan
Ruidan He
Juntao Li
Hwee Tou Ng
Lidong Bing
VLM
80
7
0
24 Sep 2020
Attention Flows: Analyzing and Comparing Attention Mechanisms in
  Language Models
Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
Joseph F DeRose
Jiayao Wang
M. Berger
65
84
0
03 Sep 2020
Contrastive Code Representation Learning
Contrastive Code Representation Learning
Paras Jain
Ajay Jain
Tianjun Zhang
Pieter Abbeel
Joseph E. Gonzalez
Ion Stoica
SSLDRL
134
151
0
09 Jul 2020
PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized
  Embedding Models
PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models
Eyal Ben-David
Carmel Rabinovitz
Roi Reichart
SSL
121
63
0
16 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and
  Strong Baselines
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
187
363
0
08 Jun 2020
Understanding Self-Attention of Self-Supervised Audio Transformers
Understanding Self-Attention of Self-Supervised Audio Transformers
Shu-Wen Yang
Andy T. Liu
Hung-yi Lee
58
27
0
05 Jun 2020
Behind the Scene: Revealing the Secrets of Pre-trained
  Vision-and-Language Models
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
123
130
0
15 May 2020
Similarity Analysis of Contextual Word Representation Models
Similarity Analysis of Contextual Word Representation Models
John M. Wu
Yonatan Belinkov
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
James R. Glass
115
75
0
03 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question
  Answering
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
91
68
0
02 May 2020
When BERT Plays the Lottery, All Tickets Are Winning
When BERT Plays the Lottery, All Tickets Are Winning
Sai Prasanna
Anna Rogers
Anna Rumshisky
MILM
88
187
0
01 May 2020
How do Decisions Emerge across Layers in Neural Models? Interpretation
  with Differentiable Masking
How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking
Nicola De Cao
Michael Schlichtkrull
Wilker Aziz
Ivan Titov
76
92
0
30 Apr 2020
What Happens To BERT Embeddings During Fine-tuning?
What Happens To BERT Embeddings During Fine-tuning?
Amil Merchant
Elahe Rahimtoroghi
Ellie Pavlick
Ian Tenney
117
189
0
29 Apr 2020
Masking as an Efficient Alternative to Finetuning for Pretrained
  Language Models
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
Mengjie Zhao
Tao R. Lin
Fei Mi
Martin Jaggi
Hinrich Schütze
77
121
0
26 Apr 2020
Quantifying the Contextualization of Word Representations with Semantic
  Class Probing
Quantifying the Contextualization of Word Representations with Semantic Class Probing
Mengjie Zhao
Philipp Dufter
Yadollah Yaghoobzadeh
Hinrich Schütze
83
27
0
25 Apr 2020
Generative Data Augmentation for Commonsense Reasoning
Generative Data Augmentation for Commonsense Reasoning
Yiben Yang
Chaitanya Malaviya
Jared Fernandez
Swabha Swayamdipta
Ronan Le Bras
Ji-ping Wang
Chandra Bhagavatula
Yejin Choi
Doug Downey
LRM
82
90
0
24 Apr 2020
12
Next