ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05874
  4. Cited By
Gradient Vaccine: Investigating and Improving Multi-task Optimization in
  Massively Multilingual Models

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

12 October 2020
Zirui Wang
Yulia Tsvetkov
Orhan Firat
Yuan Cao
ArXivPDFHTML

Papers citing "Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models"

49 / 49 papers shown
Title
BoundarySeg:An Embarrassingly Simple Method To Boost Medical Image Segmentation Performance for Low Data Regimes
BoundarySeg:An Embarrassingly Simple Method To Boost Medical Image Segmentation Performance for Low Data Regimes
Tushar Kataria
Shireen Y. Elhabian
29
0
0
14 May 2025
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li
Thuy-Trang Vu
Christian Herold
Amirhossein Tebbifakhr
Shahram Khadivi
Gholamreza Haffari
33
0
0
31 Mar 2025
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Hanwen Zhong
Jiaxin Chen
Yutong Zhang
Di Huang
Yunhong Wang
MoE
42
0
0
12 Jan 2025
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Zhiqi Bu
Xiaomeng Jin
Bhanukiran Vinzamuri
Anil Ramakrishna
Kai-Wei Chang
V. Cevher
Mingyi Hong
MU
88
6
0
29 Oct 2024
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu
Shengcao Cao
Yu-xiong Wang
55
1
0
18 Oct 2024
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Tianjian Li
Haoran Xu
Weiting Tan
Kenton Murray
Daniel Khashabi
35
1
0
06 Oct 2024
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
Haoran Xu
Kenton W. Murray
Philipp Koehn
Hieu T. Hoang
Akiko Eriguchi
Huda Khayrallah
34
8
0
04 Oct 2024
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
David Grangier
Simin Fan
Skyler Seto
Pierre Ablin
44
3
0
30 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
61
1
0
26 Aug 2024
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation
Xiaoyu Kong
Jiancan Wu
An Zhang
Leheng Sheng
Hui Lin
Xiang Wang
Xiangnan He
AI4TS
58
7
0
19 Aug 2024
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Nikolaos Dimitriadis
Pascal Frossard
F. Fleuret
MoE
67
6
0
10 Jul 2024
Towards Modular LLMs by Building and Reusing a Library of LoRAs
Towards Modular LLMs by Building and Reusing a Library of LoRAs
O. Ostapenko
Zhan Su
E. Ponti
Laurent Charlin
Nicolas Le Roux
Matheus Pereira
Lucas Caccia
Alessandro Sordoni
MoMe
44
31
0
18 May 2024
Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on
  Light-Weighed Backbones and Effective Measurement of Multi-Task Learning
  Challenges by Feature Disentanglement
Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement
Dayou Mao
Yuhao Chen
Yifan Wu
Maximilian Gilles
Alexander Wong
AAML
41
0
0
05 Feb 2024
Careful with that Scalpel: Improving Gradient Surgery with an EMA
Careful with that Scalpel: Improving Gradient Surgery with an EMA
Yu-Guan Hsieh
James Thornton
Eugène Ndiaye
Michal Klein
Marco Cuturi
Pierre Ablin
MedIm
39
0
0
05 Feb 2024
A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level
  Optimization
A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization
Feiyang Ye
Baijiong Lin
Xiao-Qun Cao
Yu Zhang
Ivor Tsang
50
6
0
17 Jan 2024
GradSim: Gradient-Based Language Grouping for Effective Multilingual
  Training
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training
Mingyang Wang
Heike Adel
Lukas Lange
Jannik Strötgen
Hinrich Schütze
30
3
0
23 Oct 2023
Adaptive Neural Ranking Framework: Toward Maximized Business Goal for
  Cascade Ranking Systems
Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems
Yunli Wang
Zhiqiang Wang
Jian Yang
Shiyang Wen
Dongying Kong
Han Li
Kun Gai
32
10
0
16 Oct 2023
Transformer-based Multimodal Change Detection with Multitask Consistency
  Constraints
Transformer-based Multimodal Change Detection with Multitask Consistency Constraints
Biyuan Liu
Huaixin Chen
Kun Li
Michael Ying Yang
38
14
0
13 Oct 2023
Deep Task-specific Bottom Representation Network for Multi-Task
  Recommendation
Deep Task-specific Bottom Representation Network for Multi-Task Recommendation
Qi Liu
Zhilong Zhou
Gangwei Jiang
T. Ge
Defu Lian
20
12
0
11 Aug 2023
TaskExpert: Dynamically Assembling Multi-Task Representations with
  Memorial Mixture-of-Experts
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
Hanrong Ye
Dan Xu
MoE
42
26
0
28 Jul 2023
Fairness in Multi-Task Learning via Wasserstein Barycenters
Fairness in Multi-Task Learning via Wasserstein Barycenters
Franccois Hu
Philipp Ratz
Arthur Charpentier
37
10
0
16 Jun 2023
Addressing Negative Transfer in Diffusion Models
Addressing Negative Transfer in Diffusion Models
Hyojun Go
Jinyoung Kim
Yunsung Lee
Seunghyun Lee
Shinhyeok Oh
Hyeongdon Moon
Seungtaek Choi
DiffM
VLM
32
24
0
01 Jun 2023
Exploring Representational Disparities Between Multilingual and
  Bilingual Translation Models
Exploring Representational Disparities Between Multilingual and Bilingual Translation Models
Neha Verma
Kenton W. Murray
Kevin Duh
14
0
0
23 May 2023
FedAds: A Benchmark for Privacy-Preserving CVR Estimation with Vertical
  Federated Learning
FedAds: A Benchmark for Privacy-Preserving CVR Estimation with Vertical Federated Learning
Penghui Wei
Hongjian Dou
Shaoguo Liu
Rong Tang
Li Liu
Liangji Wang
Bo Zheng
FedML
24
12
0
15 May 2023
KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis
KINLP at SemEval-2023 Task 12: Kinyarwanda Tweet Sentiment Analysis
Antoine Nzeyimana
20
3
0
25 Apr 2023
UniMax: Fairer and more Effective Language Sampling for Large-Scale
  Multilingual Pretraining
UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining
Hyung Won Chung
Noah Constant
Xavier Garcia
Adam Roberts
Yi Tay
Sharan Narang
Orhan Firat
26
50
0
18 Apr 2023
On the Pareto Front of Multilingual Neural Machine Translation
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
23
5
0
06 Apr 2023
Modular Deep Learning
Modular Deep Learning
Jonas Pfeiffer
Sebastian Ruder
Ivan Vulić
E. Ponti
MoMe
OOD
32
73
0
22 Feb 2023
Scaling Laws for Multilingual Neural Machine Translation
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
38
29
0
19 Feb 2023
GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks
GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks
Salah Ghamizi
Jingfeng Zhang
Maxime Cordy
Mike Papadakis
Masashi Sugiyama
Yves Le Traon
AAML
28
2
0
06 Feb 2023
FairRoad: Achieving Fairness for Recommender Systems with Optimized
  Antidote Data
FairRoad: Achieving Fairness for Recommender Systems with Optimized Antidote Data
Minghong Fang
Jia-Wei Liu
Michinari Momma
Yi Sun
30
4
0
13 Dec 2022
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
  Learning with Model-Accelerator Co-design
M3^33ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
42
81
0
26 Oct 2022
PaCo: Parameter-Compositional Multi-Task Reinforcement Learning
PaCo: Parameter-Compositional Multi-Task Reinforcement Learning
Lingfeng Sun
Haichao Zhang
Wei-ping Xu
Masayoshi Tomizuka
MoE
30
36
0
21 Oct 2022
Personalizing Intervened Network for Long-tailed Sequential User
  Behavior Modeling
Personalizing Intervened Network for Long-tailed Sequential User Behavior Modeling
Zheqi Lv
Feng Wang
Shengyu Zhang
Kun Kuang
Hongxia Yang
Fei Wu
37
8
0
19 Aug 2022
LibMTL: A Python Library for Multi-Task Learning
LibMTL: A Python Library for Multi-Task Learning
Baijiong Lin
Yu Zhang
OffRL
AI4CE
25
37
0
27 Mar 2022
X-Learner: Learning Cross Sources and Tasks for Universal Visual
  Representation
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
Yinan He
Gengshi Huang
Siyu Chen
Jianing Teng
Wang Kun
Zhen-fei Yin
Lu Sheng
Ziwei Liu
Yu Qiao
Jing Shao
VLM
SSL
ViT
43
7
0
16 Mar 2022
Combining Modular Skills in Multitask Learning
Combining Modular Skills in Multitask Learning
E. Ponti
Alessandro Sordoni
Yoshua Bengio
Siva Reddy
MoE
12
37
0
28 Feb 2022
Structured Multi-task Learning for Molecular Property Prediction
Structured Multi-task Learning for Molecular Property Prediction
Shengchao Liu
Meng Qu
Zuobai Zhang
Huiyu Cai
Jian Tang
17
24
0
22 Feb 2022
mSLAM: Massively multilingual joint pre-training for speech and text
mSLAM: Massively multilingual joint pre-training for speech and text
Ankur Bapna
Colin Cherry
Yu Zhang
Ye Jia
Melvin Johnson
Yong Cheng
Simran Khanuja
Jason Riesa
Alexis Conneau
VLM
24
111
0
03 Feb 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
Vitaly Kurin
Alessandro De Palma
Ilya Kostrikov
Shimon Whiteson
M. P. Kumar
39
73
0
11 Jan 2022
Speech Representation Learning Through Self-supervised Pretraining And
  Multi-task Finetuning
Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning
Yi-Chen Chen
Shu-Wen Yang
Cheng-Kuang Lee
Simon See
Hung-yi Lee
SSL
19
12
0
18 Oct 2021
Sequential Reptile: Inter-Task Gradient Alignment for Multilingual
  Learning
Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning
Seanie Lee
Haebeom Lee
Juho Lee
Sung Ju Hwang
MoMe
CLL
45
16
0
06 Oct 2021
A Conditional Generative Matching Model for Multi-lingual Reply
  Suggestion
A Conditional Generative Matching Model for Multi-lingual Reply Suggestion
Budhaditya Deb
Guoqing Zheng
Milad Shokouhi
Ahmed Hassan Awadallah
31
1
0
15 Sep 2021
Domain Generalization via Gradient Surgery
Domain Generalization via Gradient Surgery
Lucas Mansilla
Rodrigo Echeveste
Diego H. Milone
Enzo Ferrante
OOD
24
78
0
03 Aug 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo-wen Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Yifan Jiang
Min Ma
Junwen Bai
CLL
34
76
0
30 Apr 2021
RotoGrad: Gradient Homogenization in Multitask Learning
RotoGrad: Gradient Homogenization in Multitask Learning
Adrián Javaloy
Isabel Valera
21
86
0
03 Mar 2021
Measuring and Harnessing Transference in Multi-Task Learning
Measuring and Harnessing Transference in Multi-Task Learning
Christopher Fifty
Ehsan Amid
Zhe Zhao
Tianhe Yu
Rohan Anil
Chelsea Finn
28
15
0
29 Oct 2020
Investigating Multilingual NMT Representations at Scale
Investigating Multilingual NMT Representations at Scale
Sneha Kudugunta
Ankur Bapna
Isaac Caswell
N. Arivazhagan
Orhan Firat
LRM
144
120
0
05 Sep 2019
Multi-Way, Multilingual Neural Machine Translation with a Shared
  Attention Mechanism
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
Orhan Firat
Kyunghyun Cho
Yoshua Bengio
LRM
AIMat
231
623
0
06 Jan 2016
1