v1v2v3v4 (latest)

Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor

10 October 2020

Fei Huang

Papers citing "Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor"

34 / 34 papers shown

Title
Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning Xinyu Wang Yong Jiang Nguyen Bach Tao Wang Zhongqiang Huang Fei Huang Kewei Tu 75 147 0 08 May 2021
Automated Concatenation of Embeddings for Structured Prediction Xinyu Wang Yong Jiang Nguyen Bach Tao Wang Zhongqiang Huang Fei Huang Kewei Tu 93 177 0 10 Oct 2020
Second-Order Neural Dependency Parsing with Message Passing and End-to-End Training Xinyu Wang Kewei Tu 3DV 83 37 0 10 Oct 2020
More Embeddings, Better Sequence Labelers? Xinyu Wang Yong Jiang Nguyen Bach Tao Wang Zhongqiang Huang Fei Huang Kewei Tu 47 10 0 17 Sep 2020
AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network Xinyu Wang Yong Jiang Nguyen Bach Tao Wang Zhongqiang Huang Fei Huang Kewei Tu BDL 41 3 0 17 Sep 2020
Enhanced Universal Dependency Parsing with Second-Order Inference and Mixture of Training Data Xinyu Wang Yong Jiang Kewei Tu 63 11 0 02 Jun 2020
Distilling Neural Networks for Greener and Faster Dependency Parsing Mark Anderson Carlos Gómez-Rodríguez 44 18 0 01 Jun 2020
Named Entity Recognition as Dependency Parsing Juntao Yu Bernd Bohnet Massimo Poesio 76 419 0 14 May 2020
Efficient Second-Order TreeCRF for Neural Dependency Parsing Yu Zhang Zhenghua Li Min Zhang 52 105 0 03 May 2020
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models Subhabrata Mukherjee Ahmed Hassan Awadallah 67 59 0 12 Apr 2020
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling Xinyu Wang Yong Jiang Nguyen Bach Tao Wang Fei Huang Kewei Tu 84 36 0 08 Apr 2020
Unsupervised Cross-lingual Representation Learning at Scale Alexis Conneau Kartikay Khandelwal Naman Goyal Vishrav Chaudhary Guillaume Wenzek Francisco Guzmán Edouard Grave Myle Ott Luke Zettlemoyer Veselin Stoyanov 228 6,593 0 05 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 269 7,554 0 02 Oct 2019
Small and Practical BERT Models for Sequence Labeling Henry Tsai Jason Riesa Melvin Johnson N. Arivazhagan Xin Li Amelia Archer VLM 74 121 0 31 Aug 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding Kevin Clark Minh-Thang Luong Urvashi Khandelwal Christopher D. Manning Quoc V. Le 72 230 0 10 Jul 2019
Second-Order Semantic Dependency Parsing with End-to-End Neural Networks Xinyu Wang Jingxian Huang Kewei Tu 3DV 51 66 0 19 Jun 2019
GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling Yanjun Liu Fandong Meng Jinchao Zhang Jinan Xu Jinan Xu Jie Zhou 56 90 0 06 Jun 2019
How multilingual is Multilingual BERT? Telmo Pires Eva Schlinger Dan Garrette LRM VLM 164 1,415 0 04 Jun 2019
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT Shijie Wu Mark Dredze VLM SSeg 114 681 0 19 Apr 2019
Benchmarking Approximate Inference Methods for Neural Structured Prediction Lifu Tu Kevin Gimpel BDL 85 17 0 01 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova Jimmy J. Lin 75 421 0 28 Mar 2019
Structured Knowledge Distillation for Dense Prediction Yifan Liu Chris Liu Jingdong Wang Zhenbo Luo 104 585 0 11 Mar 2019
Viable Dependency Parsing as Sequence Labeling Michalina Strzyz David Vilares Carlos Gómez-Rodríguez 63 69 0 27 Feb 2019
Multilingual Neural Machine Translation with Knowledge Distillation Xu Tan Yi Ren Di He Tao Qin Zhou Zhao Tie-Yan Liu 98 250 0 27 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,324 0 11 Oct 2018
Design Challenges and Misconceptions in Neural Sequence Labeling Jie Yang Shuailong Liang Yue Zhang 177 164 0 12 Jun 2018
Stack-Pointer Networks for Dependency Parsing Xuezhe Ma Zecong Hu J. Liu Nanyun Peng Graham Neubig Eduard H. Hovy GNN 83 167 0 03 May 2018
Deep Biaffine Attention for Neural Dependency Parsing Timothy Dozat Christopher D. Manning 116 1,224 0 06 Nov 2016
Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser A. Kuncoro Miguel Ballesteros Lingpeng Kong Chris Dyer Noah A. Smith MoE 84 77 0 24 Sep 2016
Enriching Word Vectors with Subword Information Piotr Bojanowski Edouard Grave Armand Joulin Tomas Mikolov NAI SSL VLM 234 9,986 0 15 Jul 2016
Sequence-Level Knowledge Distillation Yoon Kim Alexander M. Rush 132 1,123 0 25 Jun 2016
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF Xuezhe Ma Eduard H. Hovy 120 2,659 0 04 Mar 2016
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 367 19,745 0 09 Mar 2015
Do Deep Nets Really Need to be Deep? Lei Jimmy Ba R. Caruana 188 2,120 0 21 Dec 2013