ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
v1v2v3v4v5v6 (latest)

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSLAIMat
ArXiv (abs)PDFHTMLGithub (3271★)

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 2,935 papers shown
Title
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
  Hyper Projections
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
Yi Tay
Zhe Zhao
Dara Bahri
Donald Metzler
Da-Cheng Juan
66
9
0
12 Jul 2020
Deep or Simple Models for Semantic Tagging? It Depends on your Data
  [Experiments]
Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments]
Jinfeng Li
Yuliang Li
Xiaolan Wang
W. Tan
VLM
36
9
0
11 Jul 2020
Alleviating the Burden of Labeling: Sentence Generation by Attention
  Branch Encoder-Decoder Network
Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network
Tadashi Ogura
A. Magassouba
K. Sugiura
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Hisashi Kawai
48
11
0
09 Jul 2020
IQ-VQA: Intelligent Visual Question Answering
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel
Mohit Chandak
A. Anand
Prithwijit Guha
64
5
0
08 Jul 2020
Remix: Rebalanced Mixup
Remix: Rebalanced Mixup
Hsin-Ping Chou
Shih-Chieh Chang
Jia-Yu Pan
Wei Wei
Da-Cheng Juan
112
237
0
08 Jul 2020
Pre-Trained Models for Heterogeneous Information Networks
Pre-Trained Models for Heterogeneous Information Networks
Yang Fang
Xiang Zhao
Yifan Chen
W. Xiao
Maarten de Rijke
SSL
47
1
0
07 Jul 2020
Efficient Conformal Prediction via Cascaded Inference with Expanded
  Admission
Efficient Conformal Prediction via Cascaded Inference with Expanded Admission
Adam Fisch
Tal Schuster
Tommi Jaakkola
Regina Barzilay
50
1
0
06 Jul 2020
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation
  using Pretraining Language Model
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model
Shilei Liu
Yu Guo
Bochao Li
Feiliang Ren
LRM
81
4
0
06 Jul 2020
Reading Comprehension in Czech via Machine Translation and Cross-lingual
  Transfer
Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer
K. Macková
Milan Straka
55
13
0
03 Jul 2020
SemEval-2020 Task 4: Commonsense Validation and Explanation
SemEval-2020 Task 4: Commonsense Validation and Explanation
Cunxiang Wang
Shuailong Liang
Yili Jin
Yilong Wang
Xiao-Dan Zhu
Yue Zhang
LRM
141
99
0
01 Jul 2020
Transferability of Natural Language Inference to Biomedical Question
  Answering
Transferability of Natural Language Inference to Biomedical Question Answering
Minbyul Jeong
Mujeen Sung
Gangwoo Kim
Donghyeon Kim
Wonjin Yoon
J. Yoo
Jaewoo Kang
80
40
0
01 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
142
135
0
30 Jun 2020
SE3M: A Model for Software Effort Estimation Using Pre-trained Embedding
  Models
SE3M: A Model for Software Effort Estimation Using Pre-trained Embedding Models
E. M. D. B. Fávero
Dalcimar Casanova
Andrey R. Pimentel
34
12
0
30 Jun 2020
Multi-Head Attention: Collaborate Instead of Concatenate
Multi-Head Attention: Collaborate Instead of Concatenate
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
82
115
0
29 Jun 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
218
1,798
0
29 Jun 2020
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant
  Supervision
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision
Chen Liang
Yue Yu
Haoming Jiang
Siawpeng Er
Ruijia Wang
T. Zhao
Chao Zhang
OffRL
78
240
0
28 Jun 2020
Video-Grounded Dialogues with Pretrained Generation Language Models
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung Le
Guosheng Lin
82
28
0
27 Jun 2020
BERTology Meets Biology: Interpreting Attention in Protein Language
  Models
BERTology Meets Biology: Interpreting Attention in Protein Language Models
Jesse Vig
Ali Madani
Lav Varshney
Caiming Xiong
R. Socher
Nazneen Rajani
110
295
0
26 Jun 2020
Train and You'll Miss It: Interactive Model Iteration with Weak
  Supervision and Pre-Trained Embeddings
Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings
Mayee F. Chen
Daniel Y. Fu
Frederic Sala
Sen Wu
Ravi Teja Mullapudi
Fait Poms
Kayvon Fatahalian
Christopher Ré
61
10
0
26 Jun 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
137
46
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
  Gradients
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
109
2
0
21 Jun 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural
  networks?
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
90
128
0
19 Jun 2020
New Vietnamese Corpus for Machine Reading Comprehension of Health News
  Articles
New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles
Kiet Van Nguyen
Tin Van Huynh
Duc-Vu Nguyen
A. Nguyen
Ngan Luu-Thuy Nguyen
75
41
0
19 Jun 2020
Neural Parameter Allocation Search
Neural Parameter Allocation Search
Bryan A. Plummer
Nikoli Dryden
Julius Frost
Torsten Hoefler
Kate Saenko
120
16
0
18 Jun 2020
Self-supervised Learning for Speech Enhancement
Self-supervised Learning for Speech Enhancement
Yuchun Wang
Shrikant Venkataramani
Paris Smaragdis
SSL
96
31
0
18 Jun 2020
I-BERT: Inductive Generalization of Transformer to Arbitrary Context
  Lengths
I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths
Hyoungwook Nam
S. Seo
Vikram Sharma Malithody
Noor Michael
Lang Li
26
1
0
18 Jun 2020
PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized
  Embedding Models
PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models
Eyal Ben-David
Carmel Rabinovitz
Roi Reichart
SSL
118
63
0
16 Jun 2020
Self-supervised Learning: Generative or Contrastive
Self-supervised Learning: Generative or Contrastive
Xiao Liu
Fanjin Zhang
Zhenyu Hou
Zhaoyu Wang
Li Mian
Jing Zhang
Jie Tang
SSL
211
1,645
0
15 Jun 2020
How to Avoid Being Eaten by a Grue: Structured Exploration Strategies
  for Textual Worlds
How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds
Prithviraj Ammanabrolu
Ethan Tien
Matthew J. Hausknecht
Mark O. Riedl
LLMAG
83
50
0
12 Jun 2020
SemEval-2020 Task 12: Multilingual Offensive Language Identification in
  Social Media (OffensEval 2020)
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
Marcos Zampieri
Preslav Nakov
Sara Rosenthal
Pepa Atanasova
Georgi Karadzhov
Hamdy Mubarak
Leon Derczynski
Zeses Pitenis
cCaugri cColtekin
81
488
0
12 Jun 2020
A Practical Sparse Approximation for Real Time Recurrent Learning
A Practical Sparse Approximation for Real Time Recurrent Learning
Jacob Menick
Erich Elsen
Utku Evci
Simon Osindero
Karen Simonyan
Alex Graves
92
32
0
12 Jun 2020
NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity
NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity
Sang-gil Lee
Sungwon Kim
Sungroh Yoon
77
17
0
11 Jun 2020
A Monolingual Approach to Contextualized Word Embeddings for
  Mid-Resource Languages
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
73
234
0
11 Jun 2020
Revisiting Few-sample BERT Fine-tuning
Revisiting Few-sample BERT Fine-tuning
Tianyi Zhang
Felix Wu
Arzoo Katiyar
Kilian Q. Weinberger
Yoav Artzi
180
446
0
10 Jun 2020
MC-BERT: Efficient Language Pre-Training via a Meta Controller
MC-BERT: Efficient Language Pre-Training via a Meta Controller
Zhenhui Xu
Linyuan Gong
Guolin Ke
Di He
Shuxin Zheng
Liwei Wang
Jiang Bian
Tie-Yan Liu
BDL
65
18
0
10 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and
  Strong Baselines
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
187
363
0
08 Jun 2020
Pre-training Polish Transformer-based Language Models at Scale
Pre-training Polish Transformer-based Language Models at Scale
Slawomir Dadas
Michal Perelkiewicz
Rafal Poswiata
98
39
0
07 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
79
343
0
07 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
160
99
0
05 Jun 2020
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual
  Representations
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
John Giorgi
Osvald Nitski
Bo Wang
Gary D. Bader
SSL
151
499
0
05 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
187
2,770
0
05 Jun 2020
GMAT: Global Memory Augmentation for Transformers
GMAT: Global Memory Augmentation for Transformers
Ankit Gupta
Jonathan Berant
RALM
81
50
0
05 Jun 2020
Understanding Self-Attention of Self-Supervised Audio Transformers
Understanding Self-Attention of Self-Supervised Audio Transformers
Shu-Wen Yang
Andy T. Liu
Hung-yi Lee
55
27
0
05 Jun 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient
  Language Processing
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
109
236
0
05 Jun 2020
Position Masking for Language Models
Position Masking for Language Models
Andy Wagner
T. Mitra
Mrinal Iyer
Godfrey Da Costa
Marc Tremblay
22
5
0
02 Jun 2020
Subjective Question Answering: Deciphering the inner workings of
  Transformers in the realm of subjectivity
Subjective Question Answering: Deciphering the inner workings of Transformers in the realm of subjectivity
Lukas Muttenthaler
41
3
0
02 Jun 2020
WikiBERT models: deep transfer learning for many languages
WikiBERT models: deep transfer learning for many languages
S. Pyysalo
Jenna Kanerva
Antti Virtanen
Filip Ginter
KELM
89
38
0
02 Jun 2020
Question Answering on Scholarly Knowledge Graphs
Question Answering on Scholarly Knowledge Graphs
M. Y. Jaradeh
M. Stocker
Sören Auer
LMTDRALM
41
13
0
02 Jun 2020
Careful analysis of XRD patterns with Attention
Careful analysis of XRD patterns with Attention
Koichi Kano
T. Segi
H. Ozono
27
0
0
02 Jun 2020
A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading
  Comprehension
A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
Jie Cai
Zhengzhou Zhu
Ping Nie
Qian Liu
AAML
26
7
0
02 Jun 2020
Previous
123...535455...575859
Next