ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
v1v2v3v4v5v6 (latest)

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSLAIMat
ArXiv (abs)PDFHTMLGithub (3271★)

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 2,935 papers shown
Title
BERT Lost Patience Won't Be Robust to Adversarial Slowdown
BERT Lost Patience Won't Be Robust to Adversarial Slowdown
Zachary Coalson
Gabriel Ritter
Rakesh Bobba
Sanghyun Hong
AAML
47
2
0
29 Oct 2023
Stacking the Odds: Transformer-Based Ensemble for AI-Generated Text
  Detection
Stacking the Odds: Transformer-Based Ensemble for AI-Generated Text Detection
Duke Nguyen
Khaing Myat Noe Naing
Aditya Joshi
61
7
0
29 Oct 2023
Multi-grained Evidence Inference for Multi-choice Reading Comprehension
Multi-grained Evidence Inference for Multi-choice Reading Comprehension
Yilin Zhao
Hai Zhao
Sufeng Duan
62
2
0
27 Oct 2023
Outlier Dimensions Encode Task-Specific Knowledge
Outlier Dimensions Encode Task-Specific Knowledge
William Rudman
Catherine Chen
Carsten Eickhoff
65
5
0
26 Oct 2023
PETA: Evaluating the Impact of Protein Transfer Learning with Sub-word
  Tokenization on Downstream Applications
PETA: Evaluating the Impact of Protein Transfer Learning with Sub-word Tokenization on Downstream Applications
Yang Tan
Mingchen Li
P. Tan
Ziyi Zhou
Huiqun Yu
Guisheng Fan
Liang Hong
65
0
0
26 Oct 2023
Understanding the Role of Input Token Characters in Language Models: How
  Does Information Loss Affect Performance?
Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?
Ahmed Alajrami
Katerina Margatina
Nikolaos Aletras
AAML
65
1
0
26 Oct 2023
Joint Entity and Relation Extraction with Span Pruning and Hypergraph
  Neural Networks
Joint Entity and Relation Extraction with Span Pruning and Hypergraph Neural Networks
Zhaohui Yan
Aaron Courville
Wei Liu
Kewei Tu
109
13
0
26 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
68
0
0
25 Oct 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Morris Alper
Hadar Averbuch-Elor
110
10
0
25 Oct 2023
FedTherapist: Mental Health Monitoring with User-Generated Linguistic
  Expressions on Smartphones via Federated Learning
FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning
Jaemin Shin
Hyungjun Yoon
Seungjoo Lee
Sungjoon Park
Yunxin Liu
Jinho D. Choi
Sung-Ju Lee
70
6
0
25 Oct 2023
Subspace Chronicles: How Linguistic Information Emerges, Shifts and
  Interacts during Language Model Training
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training
Max Müller-Eberstein
Rob van der Goot
Barbara Plank
Ivan Titov
129
10
0
25 Oct 2023
URL-BERT: Training Webpage Representations via Social Media Engagements
URL-BERT: Training Webpage Representations via Social Media Engagements
A. Qamar
Chetan Verma
Ahmed El-Kishky
Sumit Binnani
Sneha Mehta
Taylor Berg-Kirkpatrick
60
0
0
25 Oct 2023
CR-COPEC: Causal Rationale of Corporate Performance Changes to Learn
  from Financial Reports
CR-COPEC: Causal Rationale of Corporate Performance Changes to Learn from Financial Reports
Ye Eun Chun
Sunjae Kwon
Kyung-Woo Sohn
Nakwon Sung
Junyoup Lee
Byungki Seo
Kevin Compher
Seung-won Hwang
Jaesik Choi
76
1
0
24 Oct 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
  Large Language Model Compression
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Jiduan Liu
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
Ran Wang
Rui Yan
61
4
0
24 Oct 2023
TRAMS: Training-free Memory Selection for Long-range Language Modeling
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Haofei Yu
Cunxiang Wang
Yue Zhang
Wei Bi
RALM
100
6
0
24 Oct 2023
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
  Full Large Language Model
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model
Kaiyan Zhang
Ning Ding
Biqing Qi
Xuekai Zhu
Xinwei Long
Bowen Zhou
95
5
0
24 Oct 2023
PartialFormer: Modeling Part Instead of Whole for Machine Translation
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoEAI4CE
45
0
0
23 Oct 2023
Unveiling the Multi-Annotation Process: Examining the Influence of
  Annotation Quantity and Instance Difficulty on Model Performance
Unveiling the Multi-Annotation Process: Examining the Influence of Annotation Quantity and Instance Difficulty on Model Performance
Pritam Kadasi
Mayank Singh
57
3
0
23 Oct 2023
PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain
PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain
Wei-wei Zhu
Xiaoling Wang
Huanran Zheng
Mosha Chen
Buzhou Tang
ELMLM&MA
69
36
0
22 Oct 2023
Transductive Learning for Textual Few-Shot Classification in API-based
  Embedding Models
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models
Pierre Colombo
Victor Pellegrain
Malik Boudiaf
Victor Storchan
Myriam Tami
Ismail Ben Ayed
C´eline Hudelot
Pablo Piantanida
99
8
0
21 Oct 2023
A Novel Information-Theoretic Objective to Disentangle Representations
  for Fair Classification
A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification
Pierre Colombo
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
FaMLDRL
78
1
0
21 Oct 2023
Plausibility Processing in Transformer Language Models: Focusing on the
  Role of Attention Heads in GPT
Plausibility Processing in Transformer Language Models: Focusing on the Role of Attention Heads in GPT
Soo Hyun Ryu
45
0
0
20 Oct 2023
The Less the Merrier? Investigating Language Representation in
  Multilingual Models
The Less the Merrier? Investigating Language Representation in Multilingual Models
H. Nigatu
A. Tonja
Jugal Kalita
76
1
0
20 Oct 2023
Unsupervised Candidate Answer Extraction through Differentiable
  Masker-Reconstructor Model
Unsupervised Candidate Answer Extraction through Differentiable Masker-Reconstructor Model
Zhuoer Wang
Yicheng Wang
Ziwei Zhu
James Caverlee
80
0
0
19 Oct 2023
A Predictive Factor Analysis of Social Biases and Task-Performance in
  Pretrained Masked Language Models
A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models
Yi Zhou
Jose Camacho-Collados
Danushka Bollegala
153
6
0
19 Oct 2023
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared
  Pre-trained Language Models
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models
Weize Chen
Xiaoyue Xu
Xu Han
Yankai Lin
Ruobing Xie
Zhiyuan Liu
Maosong Sun
Jie Zhou
39
0
0
19 Oct 2023
Character-level Chinese Backpack Language Models
Character-level Chinese Backpack Language Models
Hao Sun
John Hewitt
59
0
0
19 Oct 2023
Time-Aware Representation Learning for Time-Sensitive Question Answering
Time-Aware Representation Learning for Time-Sensitive Question Answering
Jungbin Son
Alice Oh
70
6
0
19 Oct 2023
Pretraining Language Models with Text-Attributed Heterogeneous Graphs
Pretraining Language Models with Text-Attributed Heterogeneous Graphs
Tao Zou
Le Yu
Yifei Huang
Leilei Sun
Bo Du
AI4CE
62
17
0
19 Oct 2023
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial
  Reasoning in Text
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text
Shuaiyi Li
Yang Deng
Wai Lam
93
2
0
19 Oct 2023
SPEED: Speculative Pipelined Execution for Efficient Decoding
SPEED: Speculative Pipelined Execution for Efficient Decoding
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Hasan Genç
Kurt Keutzer
A. Gholami
Y. Shao
77
41
0
18 Oct 2023
DesignQuizzer: A Community-Powered Conversational Agent for Learning
  Visual Design
DesignQuizzer: A Community-Powered Conversational Agent for Learning Visual Design
Zhenhui Peng
Qiaoyi Chen
Zhiyu Shen
Xiaojuan Ma
Antti Oulasvirta
54
5
0
18 Oct 2023
Improving Long Document Topic Segmentation Models With Enhanced
  Coherence Modeling
Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
AI4TS
90
10
0
18 Oct 2023
Chain-of-Thought Tuning: Masked Language Models can also Think Step By
  Step in Natural Language Understanding
Chain-of-Thought Tuning: Masked Language Models can also Think Step By Step in Natural Language Understanding
Caoyun Fan
Jidong Tian
Yitian Li
Wenqing Chen
Hao He
Yaohui Jin
LRM
71
4
0
18 Oct 2023
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
77
0
0
17 Oct 2023
QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for
  Zero-Shot Commonsense Question Answering
QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for Zero-Shot Commonsense Question Answering
Haochen Shi
Weiqi Wang
Tianqing Fang
Baixuan Xu
Wenxuan Ding
Xin Liu
Yangqiu Song
113
7
0
17 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
238
163
0
16 Oct 2023
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo
Guangzhi Wang
Mohan S. Kankanhalli
38
3
0
16 Oct 2023
Diversifying the Mixture-of-Experts Representation for Language Models
  with Orthogonal Optimizer
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
80
9
0
15 Oct 2023
CarExpert: Leveraging Large Language Models for In-Car Conversational
  Question Answering
CarExpert: Leveraging Large Language Models for In-Car Conversational Question Answering
Md. Rony
Christian Suess
Sinchana Ramakanth Bhat
Viju Sudhi
Julia Schneider
Maximilian Vogel
Roman Teucher
Ken E. Friedl
S. Sahoo
71
11
0
14 Oct 2023
Low-Resource Clickbait Spoiling for Indonesian via Question Answering
Low-Resource Clickbait Spoiling for Indonesian via Question Answering
Ni Putu Intan Maharani
Ayu Purwarianti
Alham Fikri Aji
65
2
0
12 Oct 2023
To token or not to token: A Comparative Study of Text Representations
  for Cross-Lingual Transfer
To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer
Md. Mushfiqur Rahman
Fardin Ahsan Sakib
Fahim Faisal
Antonios Anastasopoulos
60
3
0
12 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
102
0
0
11 Oct 2023
On the Relationship between Sentence Analogy Identification and Sentence
  Structure Encoding in Large Language Models
On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models
Thilini Wijesiriwardene
Ruwan Wickramarachchi
Aishwarya N. Reganti
Vinija Jain
Aman Chadha
Amit P. Sheth
Amitava Das
56
1
0
11 Oct 2023
The Temporal Structure of Language Processing in the Human Brain
  Corresponds to The Layered Hierarchy of Deep Language Models
The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models
Ariel Goldstein
Eric Ham
Mariano Schain
Samuel A. Nastase
Zaid Zada
...
Avinatan Hassidim
O. Devinsky
A. Flinker
Omer Levy
Uri Hasson
AI4CE
68
10
0
11 Oct 2023
Sparse Universal Transformer
Sparse Universal Transformer
Shawn Tan
Songlin Yang
Zhenfang Chen
Aaron Courville
Chuang Gan
MoE
82
15
0
11 Oct 2023
A Comparative Study of Transformer-based Neural Text Representation
  Techniques on Bug Triaging
A Comparative Study of Transformer-based Neural Text Representation Techniques on Bug Triaging
Atish Kumar Dipongkor
Kevin Moran
23
8
0
10 Oct 2023
P5: Plug-and-Play Persona Prompting for Personalized Response Selection
P5: Plug-and-Play Persona Prompting for Personalized Response Selection
Joosung Lee
Min Sik Oh
Donghun Lee
62
3
0
10 Oct 2023
Model Tuning or Prompt Tuning? A Study of Large Language Models for
  Clinical Concept and Relation Extraction
Model Tuning or Prompt Tuning? A Study of Large Language Models for Clinical Concept and Relation Extraction
C.A.I. Peng
Xi Yang
Kaleb E. Smith
Zehao Yu
Aokun Chen
Jiang Bian
Yonghui Wu
VLMLRM
85
32
0
10 Oct 2023
Evolution of Natural Language Processing Technology: Not Just Language
  Processing Towards General Purpose AI
Evolution of Natural Language Processing Technology: Not Just Language Processing Towards General Purpose AI
Masahiro Yamamoto
54
1
0
10 Oct 2023
Previous
123...121314...575859
Next