Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.04829
Cited By
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
10 July 2019
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BAM! Born-Again Multi-Task Networks for Natural Language Understanding"
50 / 55 papers shown
Title
The Effect of Optimal Self-Distillation in Noisy Gaussian Mixture Model
Kaito Takanami
Takashi Takahashi
Ayaka Sakata
40
1
0
27 Jan 2025
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data
Eun Som Jeon
Hongjun Choi
A. Shukla
Yuan Wang
Hyunglae Lee
M. Buman
Pavan Turaga
35
3
0
07 Jul 2024
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
42
63
0
19 Jan 2024
Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation
Chengming Hu
Haolun Wu
Xuan Li
Chen Ma
Xi Chen
Jun Yan
Boyu Wang
Xue Liu
35
3
0
22 Dec 2023
Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
ViT
39
11
0
03 Nov 2023
Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning
Zhen Wang
Yikang Shen
Leonid Karlinsky
Rogerio Feris
Huan Sun
Yoon Kim
VLM
VPVLM
44
108
0
06 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
24
38
0
04 Mar 2023
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
Xiwen Liang
Minzhe Niu
Jianhua Han
Hang Xu
Chunjing Xu
Xiaodan Liang
VLM
31
14
0
03 Mar 2023
Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks
Sudipta Kar
Giuseppe Castellucci
Simone Filice
S. Malmasi
Oleg Rokhlenko
CLL
KELM
56
6
0
22 Feb 2023
Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization
A. Jafari
I. Kobyzev
Mehdi Rezagholizadeh
Pascal Poupart
A. Ghodsi
VLM
28
5
0
12 Dec 2022
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
21
11
0
23 Nov 2022
Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token
Baohao Liao
David Thulke
Sanjika Hewavitharana
Hermann Ney
Christof Monz
36
9
0
09 Nov 2022
Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis
Shuai Fan
Chen Lin
Haonan Li
Zheng-Wen Lin
Jinsong Su
Hang Zhang
Yeyun Gong
Jian Guo
Nan Duan
VLM
36
18
0
18 Oct 2022
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Neeraj Varshney
Chitta Baral
30
27
0
11 Oct 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification
Yunqing Zhao
Ngai-man Cheung
BDL
25
12
0
23 Aug 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMe
MoE
34
66
0
09 Jun 2022
Nearest Neighbor Knowledge Distillation for Neural Machine Translation
Zhixian Yang
Renliang Sun
Xiaojun Wan
18
12
0
01 May 2022
Universal Representations: A Unified Look at Multiple Task and Domain Learning
Wei-Hong Li
Xialei Liu
Hakan Bilen
SSL
OOD
30
27
0
06 Apr 2022
MetaV: A Meta-Verifier Approach to Task-Agnostic Model Fingerprinting
Xudong Pan
Yifan Yan
Mi Zhang
Min Yang
27
23
0
19 Jan 2022
Leveraging Sentiment Analysis Knowledge to Solve Emotion Detection Tasks
Maude Nguyen-The
Guillaume-Alexandre Bilodeau
Jan Rockemann
30
4
0
05 Nov 2021
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
81
18
0
16 Oct 2021
Language Modelling via Learning to Rank
A. Frydenlund
Gagandeep Singh
Frank Rudzicz
47
7
0
13 Oct 2021
Object DGCNN: 3D Object Detection using Dynamic Graphs
Yue Wang
Justin Solomon
3DPC
157
104
0
13 Oct 2021
Improving Question Answering Performance Using Knowledge Distillation and Active Learning
Yasaman Boreshban
Seyed Morteza Mirbostani
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
Shahin Amiriparian
32
15
0
26 Sep 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
107
0
24 Sep 2021
The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders
Han He
Jinho Choi
56
87
0
14 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
35
3,590
0
03 Sep 2021
Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
Qiongkai Xu
Xuanli He
Lingjuan Lyu
Lizhen Qu
Gholamreza Haffari
MLAU
40
22
0
29 Aug 2021
Multi-Task Self-Training for Learning General Representations
Golnaz Ghiasi
Barret Zoph
E. D. Cubuk
Quoc V. Le
Nayeon Lee
SSL
24
100
0
25 Aug 2021
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval
Ruiyang Ren
Shangwen Lv
Yingqi Qu
Jing Liu
Wayne Xin Zhao
Qiaoqiao She
Hua Wu
Haifeng Wang
Ji-Rong Wen
130
92
0
13 Aug 2021
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe
Keld T. Lundgaard
VLM
45
2
0
27 Jul 2021
Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau
Noah A. Smith
27
27
0
16 Jun 2021
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
Ahmad Rashid
Vasileios Lioutas
Mehdi Rezagholizadeh
AAML
13
36
0
12 May 2021
Latent-Optimized Adversarial Neural Transfer for Sarcasm Detection
Xu Guo
Boyang Albert Li
Han Yu
Chunyan Miao
AAML
28
17
0
19 Apr 2021
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models
Mor Geva
Uri Katz
Aviv Ben-Arie
Jonathan Berant
LRM
43
11
0
13 Apr 2021
Universal Representation Learning from Multiple Domains for Few-shot Classification
Weihong Li
Xialei Liu
Hakan Bilen
SSL
OOD
VLM
30
84
0
25 Mar 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
25
296
0
22 Feb 2021
Deep Multi-Task Learning for Joint Localization, Perception, and Prediction
John Phillips
Julieta Martinez
Ioan Andrei Bârsan
Sergio Casas
Abbas Sadat
R. Urtasun
35
36
0
17 Jan 2021
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
13
385
0
14 Dec 2020
MixKD: Towards Efficient Distillation of Large-scale Language Models
Kevin J Liang
Weituo Hao
Dinghan Shen
Yufan Zhou
Weizhu Chen
Changyou Chen
Lawrence Carin
19
73
0
01 Nov 2020
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang
Yong-jia Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
26
10
0
10 Oct 2020
Lifelong Language Knowledge Distillation
Yung-Sung Chuang
Shang-Yu Su
Yun-Nung Chen
KELM
CLL
27
49
0
05 Oct 2020
N-LTP: An Open-source Neural Language Technology Platform for Chinese
Wanxiang Che
Yunlong Feng
Libo Qin
Ting Liu
VLM
35
109
0
24 Sep 2020
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Jonathan Pilault
Amine Elhattami
C. Pal
CLL
MoE
30
89
0
19 Sep 2020
Multi-Task Learning with Deep Neural Networks: A Survey
M. Crawshaw
CVBM
55
609
0
10 Sep 2020
Learning Functions to Study the Benefit of Multitask Learning
Gabriele Bettgenhauser
Michael A. Hedderich
Dietrich Klakow
16
4
0
09 Jun 2020
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
23
2,851
0
09 Jun 2020
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Wenxuan Zhou
Bill Yuchen Lin
Xiang Ren
14
24
0
02 May 2020
UnifiedQA: Crossing Format Boundaries With a Single QA System
Daniel Khashabi
Sewon Min
Tushar Khot
Ashish Sabharwal
Oyvind Tafjord
Peter Clark
Hannaneh Hajishirzi
49
721
0
02 May 2020
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance
Prasetya Ajie Utama
N. Moosavi
Iryna Gurevych
OODD
12
124
0
01 May 2020
1
2
Next