ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.04829
  4. Cited By
BAM! Born-Again Multi-Task Networks for Natural Language Understanding

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

10 July 2019
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
ArXivPDFHTML

Papers citing "BAM! Born-Again Multi-Task Networks for Natural Language Understanding"

50 / 55 papers shown
Title
The Effect of Optimal Self-Distillation in Noisy Gaussian Mixture Model
The Effect of Optimal Self-Distillation in Noisy Gaussian Mixture Model
Kaito Takanami
Takashi Takahashi
Ayaka Sakata
40
1
0
27 Jan 2025
Topological Persistence Guided Knowledge Distillation for Wearable
  Sensor Data
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data
Eun Som Jeon
Hongjun Choi
A. Shukla
Yuan Wang
Hyunglae Lee
M. Buman
Pavan Turaga
35
3
0
07 Jul 2024
Knowledge Fusion of Large Language Models
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
42
63
0
19 Jan 2024
Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge
  Distillation
Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation
Chengming Hu
Haolun Wu
Xuan Li
Chen Ma
Xi Chen
Jun Yan
Boyu Wang
Xue Liu
35
3
0
22 Dec 2023
Towards a Unified Transformer-based Framework for Scene Graph Generation
  and Human-object Interaction Detection
Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
ViT
39
11
0
03 Nov 2023
Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning
Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning
Zhen Wang
Yikang Shen
Leonid Karlinsky
Rogerio Feris
Huan Sun
Yoon Kim
VLM
VPVLM
44
108
0
06 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
  Tasks
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
24
38
0
04 Mar 2023
Visual Exemplar Driven Task-Prompting for Unified Perception in
  Autonomous Driving
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
Xiwen Liang
Minzhe Niu
Jianhua Han
Hang Xu
Chunjing Xu
Xiaodan Liang
VLM
31
14
0
03 Mar 2023
Preventing Catastrophic Forgetting in Continual Learning of New Natural
  Language Tasks
Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks
Sudipta Kar
Giuseppe Castellucci
Simone Filice
S. Malmasi
Oleg Rokhlenko
CLL
KELM
56
6
0
22 Feb 2023
Continuation KD: Improved Knowledge Distillation through the Lens of
  Continuation Optimization
Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization
A. Jafari
I. Kobyzev
Mehdi Rezagholizadeh
Pascal Poupart
A. Ghodsi
VLM
28
5
0
12 Dec 2022
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like
  Humans?
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
21
11
0
23 Nov 2022
Mask More and Mask Later: Efficient Pre-training of Masked Language
  Models by Disentangling the [MASK] Token
Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token
Baohao Liao
David Thulke
Sanjika Hewavitharana
Hermann Ney
Christof Monz
36
9
0
09 Nov 2022
Sentiment-Aware Word and Sentence Level Pre-training for Sentiment
  Analysis
Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis
Shuai Fan
Chen Lin
Haonan Li
Zheng-Wen Lin
Jinsong Su
Hang Zhang
Yeyun Gong
Jian Guo
Nan Duan
VLM
36
18
0
18 Oct 2022
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of
  NLP Systems
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Neeraj Varshney
Chitta Baral
30
27
0
11 Oct 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot
  Classification
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification
Yunqing Zhao
Ngai-man Cheung
BDL
25
12
0
23 Aug 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
  MoEs
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMe
MoE
34
66
0
09 Jun 2022
Nearest Neighbor Knowledge Distillation for Neural Machine Translation
Nearest Neighbor Knowledge Distillation for Neural Machine Translation
Zhixian Yang
Renliang Sun
Xiaojun Wan
18
12
0
01 May 2022
Universal Representations: A Unified Look at Multiple Task and Domain
  Learning
Universal Representations: A Unified Look at Multiple Task and Domain Learning
Wei-Hong Li
Xialei Liu
Hakan Bilen
SSL
OOD
30
27
0
06 Apr 2022
MetaV: A Meta-Verifier Approach to Task-Agnostic Model Fingerprinting
MetaV: A Meta-Verifier Approach to Task-Agnostic Model Fingerprinting
Xudong Pan
Yifan Yan
Mi Zhang
Min Yang
27
23
0
19 Jan 2022
Leveraging Sentiment Analysis Knowledge to Solve Emotion Detection Tasks
Leveraging Sentiment Analysis Knowledge to Solve Emotion Detection Tasks
Maude Nguyen-The
Guillaume-Alexandre Bilodeau
Jan Rockemann
30
4
0
05 Nov 2021
Pro-KD: Progressive Distillation by Following the Footsteps of the
  Teacher
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
81
18
0
16 Oct 2021
Language Modelling via Learning to Rank
Language Modelling via Learning to Rank
A. Frydenlund
Gagandeep Singh
Frank Rudzicz
47
7
0
13 Oct 2021
Object DGCNN: 3D Object Detection using Dynamic Graphs
Object DGCNN: 3D Object Detection using Dynamic Graphs
Yue Wang
Justin Solomon
3DPC
157
104
0
13 Oct 2021
Improving Question Answering Performance Using Knowledge Distillation
  and Active Learning
Improving Question Answering Performance Using Knowledge Distillation and Active Learning
Yasaman Boreshban
Seyed Morteza Mirbostani
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
Shahin Amiriparian
32
15
0
26 Sep 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
  Inference
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
107
0
24 Sep 2021
The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with
  Transformer Encoders
The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders
Han He
Jinho Choi
56
87
0
14 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
35
3,590
0
03 Sep 2021
Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
Qiongkai Xu
Xuanli He
Lingjuan Lyu
Lizhen Qu
Gholamreza Haffari
MLAU
40
22
0
29 Aug 2021
Multi-Task Self-Training for Learning General Representations
Multi-Task Self-Training for Learning General Representations
Golnaz Ghiasi
Barret Zoph
E. D. Cubuk
Quoc V. Le
Nayeon Lee
SSL
24
100
0
25 Aug 2021
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense
  Passage Retrieval
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval
Ruiyang Ren
Shangwen Lv
Yingqi Qu
Jing Liu
Wayne Xin Zhao
Qiaoqiao She
Hua Wu
Haifeng Wang
Ji-Rong Wen
130
92
0
13 Aug 2021
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe
Keld T. Lundgaard
VLM
45
2
0
27 Jul 2021
Specializing Multilingual Language Models: An Empirical Study
Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau
Noah A. Smith
27
27
0
16 Jun 2021
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
Ahmad Rashid
Vasileios Lioutas
Mehdi Rezagholizadeh
AAML
13
36
0
12 May 2021
Latent-Optimized Adversarial Neural Transfer for Sarcasm Detection
Latent-Optimized Adversarial Neural Transfer for Sarcasm Detection
Xu Guo
Boyang Albert Li
Han Yu
Chunyan Miao
AAML
28
17
0
19 Apr 2021
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models
Mor Geva
Uri Katz
Aviv Ben-Arie
Jonathan Berant
LRM
43
11
0
13 Apr 2021
Universal Representation Learning from Multiple Domains for Few-shot
  Classification
Universal Representation Learning from Multiple Domains for Few-shot Classification
Weihong Li
Xialei Liu
Hakan Bilen
SSL
OOD
VLM
30
84
0
25 Mar 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
25
296
0
22 Feb 2021
Deep Multi-Task Learning for Joint Localization, Perception, and
  Prediction
Deep Multi-Task Learning for Joint Localization, Perception, and Prediction
John Phillips
Julieta Martinez
Ioan Andrei Bârsan
Sergio Casas
Abbas Sadat
R. Urtasun
35
36
0
17 Jan 2021
Parameter-Efficient Transfer Learning with Diff Pruning
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
13
385
0
14 Dec 2020
MixKD: Towards Efficient Distillation of Large-scale Language Models
MixKD: Towards Efficient Distillation of Large-scale Language Models
Kevin J Liang
Weituo Hao
Dinghan Shen
Yufan Zhou
Weizhu Chen
Changyou Chen
Lawrence Carin
19
73
0
01 Nov 2020
Structural Knowledge Distillation: Tractably Distilling Information for
  Structured Predictor
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang
Yong-jia Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
26
10
0
10 Oct 2020
Lifelong Language Knowledge Distillation
Lifelong Language Knowledge Distillation
Yung-Sung Chuang
Shang-Yu Su
Yun-Nung Chen
KELM
CLL
27
49
0
05 Oct 2020
N-LTP: An Open-source Neural Language Technology Platform for Chinese
N-LTP: An Open-source Neural Language Technology Platform for Chinese
Wanxiang Che
Yunlong Feng
Libo Qin
Ting Liu
VLM
35
109
0
24 Sep 2020
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning
  in NLP Using Fewer Parameters & Less Data
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Jonathan Pilault
Amine Elhattami
C. Pal
CLL
MoE
30
89
0
19 Sep 2020
Multi-Task Learning with Deep Neural Networks: A Survey
Multi-Task Learning with Deep Neural Networks: A Survey
M. Crawshaw
CVBM
55
609
0
10 Sep 2020
Learning Functions to Study the Benefit of Multitask Learning
Learning Functions to Study the Benefit of Multitask Learning
Gabriele Bettgenhauser
Michael A. Hedderich
Dietrich Klakow
16
4
0
09 Jun 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
23
2,851
0
09 Jun 2020
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Wenxuan Zhou
Bill Yuchen Lin
Xiang Ren
14
24
0
02 May 2020
UnifiedQA: Crossing Format Boundaries With a Single QA System
UnifiedQA: Crossing Format Boundaries With a Single QA System
Daniel Khashabi
Sewon Min
Tushar Khot
Ashish Sabharwal
Oyvind Tafjord
Peter Clark
Hannaneh Hajishirzi
49
721
0
02 May 2020
Mind the Trade-off: Debiasing NLU Models without Degrading the
  In-distribution Performance
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance
Prasetya Ajie Utama
N. Moosavi
Iryna Gurevych
OODD
12
124
0
01 May 2020
12
Next