ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.00100
  4. Cited By
Small and Practical BERT Models for Sequence Labeling

Small and Practical BERT Models for Sequence Labeling

31 August 2019
Henry Tsai
Jason Riesa
Melvin Johnson
N. Arivazhagan
Xin Li
Amelia Archer
    VLM
ArXiv (abs)PDFHTML

Papers citing "Small and Practical BERT Models for Sequence Labeling"

50 / 66 papers shown
Title
Survey on Knowledge Distillation for Large Language Models: Methods,
  Evaluation, and Application
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
Chuanpeng Yang
Wang Lu
Yao Zhu
Yidong Wang
Qian Chen
Chenlong Gao
Bingjie Yan
Yiqiang Chen
ALMKELM
101
32
0
02 Jul 2024
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation
  Strategy by Language Models and Humans
mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
92
5
0
06 Jun 2024
Multi-granular Adversarial Attacks against Black-box Neural Ranking
  Models
Multi-granular Adversarial Attacks against Black-box Neural Ranking Models
Yuansan Liu
Ruqing Zhang
Jiafeng Guo
Maarten de Rijke
Yixing Fan
Xueqi Cheng
AAML
120
15
0
02 Apr 2024
TAMS: Translation-Assisted Morphological Segmentation
TAMS: Translation-Assisted Morphological Segmentation
Enora Rice
Ali Marashian
Luke Gessler
Alexis Palmer
Katharina von der Wense
57
0
0
21 Mar 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
150
35
0
05 Feb 2024
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
149
0
0
28 Nov 2023
Co-training and Co-distillation for Quality Improvement and Compression
  of Language Models
Co-training and Co-distillation for Quality Improvement and Compression of Language Models
Hayeon Lee
Rui Hou
Jongpil Kim
Davis Liang
Hongbo Zhang
Sung Ju Hwang
Alexander Min
55
0
0
06 Nov 2023
Automatic Disfluency Detection from Untranscribed Speech
Automatic Disfluency Detection from Untranscribed Speech
Amrit Romana
K. Koishida
E. Provost
70
8
0
01 Nov 2023
Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models
Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models
Himmet Toprak Kesgin
M. K. Yuce
M. Amasyalı
22
7
0
26 Jul 2023
Unsupervised Dense Retrieval Training with Web Anchors
Unsupervised Dense Retrieval Training with Web Anchors
Yiqing Xie
X. Liu
Chenyan Xiong
RALM
42
3
0
10 May 2023
Processing Natural Language on Embedded Devices: How Well Do Transformer
  Models Perform?
Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform?
Souvik Sarkar
Mohammad Fakhruddin Babar
Md. Mahadi Hassan
M. Hasan
Shubhra (Santu) Karmaker
32
1
0
23 Apr 2023
Swing Distillation: A Privacy-Preserving Knowledge Distillation
  Framework
Swing Distillation: A Privacy-Preserving Knowledge Distillation Framework
Junzhuo Li
Xinwei Wu
Weilong Dong
Shuangzhi Wu
Chao Bian
Deyi Xiong
113
4
0
16 Dec 2022
Intriguing Properties of Compression on Multilingual Models
Intriguing Properties of Compression on Multilingual Models
Kelechi Ogueji
Orevaoghene Ahia
Gbemileke Onilude
Sebastian Gehrmann
Sara Hooker
Julia Kreutzer
71
14
0
04 Nov 2022
You Can Have Your Data and Balance It Too: Towards Balanced and
  Efficient Multilingual Models
You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models
Tomasz Limisiewicz
Daniel Malkin
Gabriel Stanovsky
76
4
0
13 Oct 2022
AutoDistill: an End-to-End Framework to Explore and Distill
  Hardware-Efficient Language Models
AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Xiaofan Zhang
Zongwei Zhou
Deming Chen
Yu Emma Wang
81
11
0
21 Jan 2022
Which Student is Best? A Comprehensive Knowledge Distillation Exam for
  Task-Specific BERT Models
Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models
Made Nindyatama Nityasya
Haryo Akbarianto Wibowo
Rendi Chevi
Radityo Eko Prasojo
Alham Fikri Aji
76
6
0
03 Jan 2022
Role of Language Relatedness in Multilingual Fine-tuning of Language
  Models: A Case Study in Indo-Aryan Languages
Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages
Tejas I. Dhamecha
V. Rudramurthy
Samarth Bharadwaj
Karthik Sankaranarayanan
P. Bhattacharyya
93
26
0
22 Sep 2021
Classification-based Quality Estimation: Small and Efficient Models for
  Real-world Applications
Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications
Shuo Sun
Ahmed El-Kishky
Vishrav Chaudhary
James Cross
Francisco Guzmán
Lucia Specia
64
1
0
17 Sep 2021
General Cross-Architecture Distillation of Pretrained Language Models
  into Matrix Embeddings
General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings
Lukas Galke
Isabelle Cuber
Christophe Meyer
Henrik Ferdinand Nolscher
Angelina Sonderecker
A. Scherp
140
2
0
17 Sep 2021
Frequency Effects on Syntactic Rule Learning in Transformers
Frequency Effects on Syntactic Rule Learning in Transformers
Jason W. Wei
Dan Garrette
Tal Linzen
Ellie Pavlick
147
67
0
14 Sep 2021
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient
  Pre-trained Language Models
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models
Yichun Yin
Cheng Chen
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
VLM
69
50
0
29 Jul 2021
Using Machine Translation to Localize Task Oriented NLG Output
Using Machine Translation to Localize Task Oriented NLG Output
Scott Roy
Clifford Brunk
Kyu-Young Kim
Justin Zhao
Markus Freitag
Mihir Kale
G. Bansal
Sidharth Mudgal
Chris Varano
32
1
0
09 Jul 2021
Why Can You Lay Off Heads? Investigating How BERT Heads Transfer
Why Can You Lay Off Heads? Investigating How BERT Heads Transfer
Ting-Rui Chiang
Yun-Nung Chen
38
0
0
14 Jun 2021
RoSearch: Search for Robust Student Architectures When Distilling
  Pre-trained Language Models
RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models
Xin Guo
Jianlei Yang
Haoyi Zhou
Xucheng Ye
Jianxin Li
52
1
0
07 Jun 2021
MergeDistill: Merging Pre-trained Language Models using Distillation
MergeDistill: Merging Pre-trained Language Models using Distillation
Simran Khanuja
Melvin Johnson
Partha P. Talukdar
84
16
0
05 Jun 2021
MOROCCO: Model Resource Comparison Framework
MOROCCO: Model Resource Comparison Framework
Valentin Malykh
Alexander Kukushkin
Ekaterina Artemova
Vladislav Mikhailov
Maria Tikhonova
Tatiana Shavrina
55
0
0
29 Apr 2021
Morph Call: Probing Morphosyntactic Content of Multilingual Transformers
Morph Call: Probing Morphosyntactic Content of Multilingual Transformers
Vladislav Mikhailov
O. Serikov
Ekaterina Artemova
82
9
0
26 Apr 2021
Zero-Resource Multi-Dialectal Arabic Natural Language Understanding
Zero-Resource Multi-Dialectal Arabic Natural Language Understanding
Muhammad Khalifa
Hesham A. Hassan
A. Fahmy
56
7
0
14 Apr 2021
One Network Fits All? Modular versus Monolithic Task Formulations in
  Neural Networks
One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Atish Agarwala
Abhimanyu Das
Brendan Juba
Rina Panigrahy
Vatsal Sharan
Xin Wang
Qiuyi Zhang
MoMe
41
11
0
29 Mar 2021
A Practical Survey on Faster and Lighter Transformers
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
137
103
0
26 Mar 2021
LightMBERT: A Simple Yet Effective Method for Multilingual BERT
  Distillation
LightMBERT: A Simple Yet Effective Method for Multilingual BERT Distillation
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
65
9
0
11 Mar 2021
BERT-based knowledge extraction method of unstructured domain text
BERT-based knowledge extraction method of unstructured domain text
Wang Zijia
L. Ye
Zhongkai Zhu
29
1
0
01 Mar 2021
RuSentEval: Linguistic Source, Encoder Force!
RuSentEval: Linguistic Source, Encoder Force!
Vladislav Mikhailov
Ekaterina Taktasheva
Elina Sigdel
Ekaterina Artemova
VLM
36
6
0
28 Feb 2021
Distilling Large Language Models into Tiny and Effective Students using
  pQRNN
Distilling Large Language Models into Tiny and Effective Students using pQRNN
P. Kaliamoorthi
Aditya Siddhant
Edward Li
Melvin Johnson
MQ
60
17
0
21 Jan 2021
Self-Training Pre-Trained Language Models for Zero- and Few-Shot
  Multi-Dialectal Arabic Sequence Labeling
Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling
Muhammad Khalifa
Muhammad Abdul-Mageed
Khaled Shaalan
104
17
0
12 Jan 2021
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual
  Natural Language Processing
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
Minh Nguyen
Viet Dac Lai
Amir Pouran Ben Veyseh
Thien Huu Nguyen
124
137
0
09 Jan 2021
MiniLMv2: Multi-Head Self-Attention Relation Distillation for
  Compressing Pretrained Transformers
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
124
274
0
31 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for
  Natural Language Understanding
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
119
60
0
14 Dec 2020
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
49
12
0
11 Dec 2020
Deep Clustering of Text Representations for Supervision-free Probing of
  Syntax
Deep Clustering of Text Representations for Supervision-free Probing of Syntax
Vikram Gupta
Haoyue Shi
Kevin Gimpel
Mrinmaya Sachan
89
9
0
24 Oct 2020
Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
Elliot Schumacher
J. Mayfield
Mark Dredze
19
6
0
19 Oct 2020
Probing Pretrained Language Models for Lexical Semantics
Probing Pretrained Language Models for Lexical Semantics
Ivan Vulić
Edoardo Ponti
Robert Litschko
Goran Glavaš
Anna Korhonen
KELM
86
246
0
12 Oct 2020
Load What You Need: Smaller Versions of Multilingual BERT
Load What You Need: Smaller Versions of Multilingual BERT
Amine Abdaoui
Camille Pradel
Grégoire Sigel
96
74
0
12 Oct 2020
Learning Which Features Matter: RoBERTa Acquires a Preference for
  Linguistic Generalizations (Eventually)
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
Alex Warstadt
Yian Zhang
Haau-Sing Li
Haokun Liu
Samuel R. Bowman
SSLAI4CE
78
21
0
11 Oct 2020
Structural Knowledge Distillation: Tractably Distilling Information for
  Structured Predictor
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang
Yong Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
84
10
0
10 Oct 2020
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Ethan C. Chau
Lucy H. Lin
Noah A. Smith
96
15
0
29 Sep 2020
AIN: Fast and Accurate Sequence Labeling with Approximate Inference
  Network
AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network
Xinyu Wang
Yong Jiang
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
BDL
41
3
0
17 Sep 2020
Finding Fast Transformers: One-Shot Neural Architecture Search by
  Component Composition
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition
Henry Tsai
Jayden Ooi
Chun-Sung Ferng
Hyung Won Chung
Jason Riesa
ViT
75
21
0
15 Aug 2020
Revisiting One-vs-All Classifiers for Predictive Uncertainty and
  Out-of-Distribution Detection in Neural Networks
Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks
Shreyas Padhy
Zachary Nado
Jie Jessie Ren
J. Liu
Jasper Snoek
Balaji Lakshminarayanan
UQCV
88
47
0
10 Jul 2020
Adversarial Alignment of Multilingual Models for Extracting Temporal
  Expressions from Text
Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text
Lukas Lange
Anastasiia Iurshina
Heike Adel
Jannik Strötgen
71
29
0
19 May 2020
12
Next