ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.01378
  4. Cited By
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

2 December 2022
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
    MoMe
ArXivPDFHTML

Papers citing "ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning"

50 / 58 papers shown
Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
152
120
0
10 Apr 2025
Ensembling Diffusion Models via Adaptive Feature Aggregation
Ensembling Diffusion Models via Adaptive Feature Aggregation
Cong Wang
Kuan Tian
Yonghang Guan
Jun Zhang
Zhiwei Jiang
Fei Shen
Xiao Han
87
5
0
27 May 2024
Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning
Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning
Zhen Wang
Yikang Shen
Leonid Karlinsky
Rogerio Feris
Huan Sun
Yoon Kim
VLM
VPVLM
66
113
0
06 Mar 2023
Knowledge is a Region in Weight Space for Fine-tuned Language Models
Knowledge is a Region in Weight Space for Fine-tuned Language Models
Almog Gueta
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
57
51
0
09 Feb 2023
Exploring the Benefits of Training Expert Language Models over
  Instruction Tuning
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Joel Jang
Seungone Kim
Seonghyeon Ye
Doyoung Kim
Lajanugen Logeswaran
Moontae Lee
Kyungjae Lee
Minjoon Seo
LRM
ALM
84
79
0
07 Feb 2023
Dataless Knowledge Fusion by Merging Weights of Language Models
Dataless Knowledge Fusion by Merging Weights of Language Models
Xisen Jin
Xiang Ren
Daniel Preoţiuc-Pietro
Pengxiang Cheng
FedML
MoMe
57
236
0
19 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
173
486
0
08 Dec 2022
Mechanistic Mode Connectivity
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
77
46
0
15 Nov 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan
Hanie Sedghi
O. Saukh
R. Entezari
Behnam Neyshabur
MoMe
59
97
0
15 Nov 2022
Where to start? Analyzing the potential value of intermediate models
Where to start? Analyzing the potential value of intermediate models
Leshem Choshen
Elad Venezian
Shachar Don-Yehiya
Noam Slonim
Yoav Katz
MoMe
55
27
0
31 Oct 2022
Exploring Mode Connectivity for Pre-trained Language Models
Exploring Mode Connectivity for Pre-trained Language Models
Yujia Qin
Cheng Qian
Jing Yi
Weize Chen
Yankai Lin
Xu Han
Zhiyuan Liu
Maosong Sun
Jie Zhou
51
21
0
25 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
167
3,110
0
20 Oct 2022
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
69
24
0
19 Oct 2022
Git Re-Basin: Merging Models modulo Permutation Symmetries
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
282
328
0
11 Sep 2022
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language
  Models
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Margaret Li
Suchin Gururangan
Tim Dettmers
M. Lewis
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoMe
68
148
0
05 Aug 2022
Linear Connectivity Reveals Generalization Strategies
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
279
45
0
24 May 2022
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
  In-Context Learning
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
Haokun Liu
Derek Tam
Mohammed Muqeeth
Jay Mohta
Tenghao Huang
Joey Tianyi Zhou
Colin Raffel
86
899
0
11 May 2022
Fusing finetuned models for better pretraining
Fusing finetuned models for better pretraining
Leshem Choshen
Elad Venezian
Noam Slonim
Yoav Katz
FedML
AI4CE
MoMe
103
94
0
06 Apr 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
180
1,941
0
29 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
116
976
1
10 Mar 2022
Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Guanzheng Chen
Fangyu Liu
Zaiqiao Meng
Shangsong Liang
45
93
0
16 Feb 2022
Scaling Laws Under the Microscope: Predicting Transformer Performance
  from Small Scale Experiments
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi
Y. Carmon
Jonathan Berant
53
17
0
13 Feb 2022
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
V. Aribandi
Yi Tay
Tal Schuster
J. Rao
H. Zheng
...
Jianmo Ni
Jai Gupta
Kai Hui
Sebastian Ruder
Donald Metzler
MoE
75
215
0
22 Nov 2021
Merging Models with Fisher-Weighted Averaging
Merging Models with Fisher-Weighted Averaging
Michael Matena
Colin Raffel
FedML
MoMe
85
389
0
18 Nov 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
337
1,696
0
15 Oct 2021
The Grammar-Learning Trajectories of Neural Language Models
The Grammar-Learning Trajectories of Neural Language Models
Leshem Choshen
Guy Hacohen
D. Weinshall
Omri Abend
71
28
0
13 Sep 2021
RotoGrad: Gradient Homogenization in Multitask Learning
RotoGrad: Gradient Homogenization in Multitask Learning
Adrián Javaloy
Isabel Valera
82
88
0
03 Mar 2021
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Gregory W. Benton
Wesley J. Maddox
Sanae Lotfi
A. Wilson
UQCV
78
69
0
25 Feb 2021
Muppet: Massive Multi-task Representations with Pre-Finetuning
Muppet: Massive Multi-task Representations with Pre-Finetuning
Armen Aghajanyan
Anchit Gupta
Akshat Shrivastava
Xilun Chen
Luke Zettlemoyer
Sonal Gupta
68
268
0
26 Jan 2021
Investigating Societal Biases in a Poetry Composition System
Investigating Societal Biases in a Poetry Composition System
Emily Sheng
David C. Uthus
64
53
0
05 Nov 2020
Linear Mode Connectivity in Multitask and Continual Learning
Linear Mode Connectivity in Multitask and Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Dilan Görür
Razvan Pascanu
H. Ghasemzadeh
CLL
61
141
0
09 Oct 2020
Meta-Learning in Neural Networks: A Survey
Meta-Learning in Neural Networks: A Survey
Timothy M. Hospedales
Antreas Antoniou
P. Micaelli
Amos Storkey
OOD
369
1,967
0
11 Apr 2020
Gradient Surgery for Multi-Task Learning
Gradient Surgery for Multi-Task Learning
Tianhe Yu
Saurabh Kumar
Abhishek Gupta
Sergey Levine
Karol Hausman
Chelsea Finn
159
1,211
0
19 Jan 2020
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
144
617
0
11 Dec 2019
SemEval-2017 Task 4: Sentiment Analysis in Twitter
SemEval-2017 Task 4: Sentiment Analysis in Twitter
Sara Rosenthal
N. Farra
Preslav Nakov
VLM
79
798
0
02 Dec 2019
Learning to Few-Shot Learn Across Diverse Natural Language
  Classification Tasks
Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks
Trapit Bansal
Rishikesh Jha
Andrew McCallum
SSL
55
119
0
10 Nov 2019
Adversarial NLI: A New Benchmark for Natural Language Understanding
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie
Adina Williams
Emily Dinan
Joey Tianyi Zhou
Jason Weston
Douwe Kiela
115
1,003
0
31 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
381
20,053
0
23 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
536
24,351
0
26 Jul 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
210
1,511
0
24 May 2019
On The Power of Curriculum Learning in Training Deep Networks
On The Power of Curriculum Learning in Training Deep Networks
Guy Hacohen
D. Weinshall
ODL
68
442
0
07 Apr 2019
Predicting the Type and Target of Offensive Posts in Social Media
Predicting the Type and Target of Offensive Posts in Social Media
Marcos Zampieri
S. Malmasi
Preslav Nakov
Sara Rosenthal
N. Farra
Ritesh Kumar
83
774
0
25 Feb 2019
e-SNLI: Natural Language Inference with Natural Language Explanations
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
406
637
0
04 Dec 2018
Sentence Encoders on STILTs: Supplementary Training on Intermediate
  Labeled-data Tasks
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
Jason Phang
Thibault Févry
Samuel R. Bowman
85
468
0
02 Nov 2018
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive
  Meaning Representations
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
Mohammad Taher Pilehvar
Jose Camacho-Collados
173
485
0
28 Aug 2018
Neural Network Acceptability Judgments
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
224
1,406
0
31 May 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
112
1,658
0
14 Mar 2018
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
Alex Nichol
Joshua Achiam
John Schulman
221
2,229
0
08 Mar 2018
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
T. Garipov
Pavel Izmailov
Dmitrii Podoprikhin
Dmitry Vetrov
A. Wilson
UQCV
78
750
0
27 Feb 2018
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep
  Multitask Networks
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
Zhao Chen
Vijay Badrinarayanan
Chen-Yu Lee
Andrew Rabinovich
ODL
154
1,284
0
07 Nov 2017
12
Next