Merging Models with Fisher-Weighted Averaging

18 November 2021

Papers citing "Merging Models with Fisher-Weighted Averaging"

33 / 283 papers shown

Title
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation Nico Daheim Nouha Dziri Mrinmaya Sachan Iryna Gurevych E. Ponti MoMe 34 30 0 30 Mar 2023
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies Daniel Lawson A. H. Qureshi MoMe OffRL 26 13 0 14 Mar 2023
Modular Deep Learning Jonas Pfeiffer Sebastian Ruder Ivan Vulić E. Ponti MoMe OOD 32 73 0 22 Feb 2023
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models Alexandra Chronopoulou Matthew E. Peters Alexander Fraser Jesse Dodge MoMe 23 65 0 14 Feb 2023
Knowledge is a Region in Weight Space for Fine-tuned Language Models Almog Gueta Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen 31 49 0 09 Feb 2023
Exploring the Benefits of Training Expert Language Models over Instruction Tuning Joel Jang Seungone Kim Seonghyeon Ye Doyoung Kim Lajanugen Logeswaran Moontae Lee Kyungjae Lee Minjoon Seo LRM ALM 30 79 0 07 Feb 2023
Projected Subnetworks Scale Adaptation Siddhartha Datta N. Shadbolt VLM CLL 28 0 0 27 Jan 2023
Backward Compatibility During Data Updates by Weight Interpolation Raphael Schumann Elman Mansimov Yi-An Lai Nikolaos Pappas Xibin Gao Yi Zhang 13 4 0 25 Jan 2023
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization Alexandre Ramé Kartik Ahuja Jianyu Zhang Matthieu Cord Léon Bottou David Lopez-Paz MoMe OODD 37 81 0 20 Dec 2022
Dataless Knowledge Fusion by Merging Weights of Language Models Xisen Jin Xiang Ren Daniel Preotiuc-Pietro Pengxiang Cheng FedML MoMe 24 214 0 19 Dec 2022
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment Chen Zhang L. F. D’Haro Qiquan Zhang Thomas Friedrichs Haizhou Li 26 7 0 18 Dec 2022
Editing Models with Task Arithmetic Gabriel Ilharco Marco Tulio Ribeiro Mitchell Wortsman Suchin Gururangan Ludwig Schmidt Hannaneh Hajishirzi Ali Farhadi KELM MoMe MU 63 435 0 08 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning Shachar Don-Yehiya Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen MoMe 28 52 0 02 Dec 2022
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation Cody Blakeney Jessica Zosa Forde Jonathan Frankle Ziliang Zong Matthew L. Leavitt VLM 30 4 0 01 Nov 2022
Where to start? Analyzing the potential value of intermediate models Leshem Choshen Elad Venezian Shachar Don-Yehiya Noam Slonim Yoav Katz MoMe 17 27 0 31 Oct 2022
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning Yaqing Wang Sahaj Agarwal Subhabrata Mukherjee Xiaodong Liu Jing Gao Ahmed Hassan Awadallah Jianfeng Gao MoE 19 117 0 31 Oct 2022
Exploring Mode Connectivity for Pre-trained Language Models Yujia Qin Cheng Qian Jing Yi Weize Chen Yankai Lin Xu Han Zhiyuan Liu Maosong Sun Jie Zhou 29 20 0 25 Oct 2022
Revisiting Checkpoint Averaging for Neural Machine Translation Yingbo Gao Christian Herold Zijian Yang Hermann Ney MoMe 25 11 0 21 Oct 2022
lo-fi: distributed fine-tuning without communication Mitchell Wortsman Suchin Gururangan Shen Li Ali Farhadi Ludwig Schmidt Michael G. Rabbat Ari S. Morcos 32 24 0 19 Oct 2022
Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters Hongyu Zhao Hao Tan Hongyuan Mei MoE 31 16 0 18 Oct 2022
Git Re-Basin: Merging Models modulo Permutation Symmetries Samuel K. Ainsworth J. Hayase S. Srinivasa MoMe 255 314 0 11 Sep 2022
Patching open-vocabulary models by interpolating weights Gabriel Ilharco Mitchell Wortsman S. Gadre Shuran Song Hannaneh Hajishirzi Simon Kornblith Ali Farhadi Ludwig Schmidt VLM KELM 32 166 0 10 Aug 2022
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models Margaret Li Suchin Gururangan Tim Dettmers M. Lewis Tim Althoff Noah A. Smith Luke Zettlemoyer MoMe 26 142 0 05 Aug 2022
Diverse Weight Averaging for Out-of-Distribution Generalization Alexandre Ramé Matthieu Kirchmeyer Thibaud Rahier A. Rakotomamonjy Patrick Gallinari Matthieu Cord OOD 199 128 0 19 May 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance Katherine Crowson Stella Biderman Daniel Kornis Dashiell Stander Eric Hallahan Louis Castricato Edward Raff CLIP 74 368 0 18 Apr 2022
Fusing finetuned models for better pretraining Leshem Choshen Elad Venezian Noam Slonim Yoav Katz FedML AI4CE MoMe 45 87 0 06 Apr 2022
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Mitchell Wortsman Gabriel Ilharco S. Gadre Rebecca Roelofs Raphael Gontijo-Lopes ... Hongseok Namkoong Ali Farhadi Y. Carmon Simon Kornblith Ludwig Schmidt MoMe 54 916 1 10 Mar 2022
Out-of-Distribution Detection Without Class Labels Niv Cohen Ron Abutbul Yedid Hoshen OODD 24 10 0 14 Dec 2021
Exploiting all samples in low-resource sentence classification: early stopping and initialization parameters Hongseok Choi Hyunju Lee 21 3 0 12 Nov 2021
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora Xisen Jin Dejiao Zhang Henghui Zhu Wei Xiao Shang-Wen Li Xiaokai Wei Andrew O. Arnold Xiang Ren KELM CLL 31 110 0 16 Oct 2021
Robust fine-tuning of zero-shot models Mitchell Wortsman Gabriel Ilharco Jong Wook Kim Mike Li Simon Kornblith ... Raphael Gontijo-Lopes Hannaneh Hajishirzi Ali Farhadi Hongseok Namkoong Ludwig Schmidt VLM 61 689 0 04 Sep 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 255 4,489 0 23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018