Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.05719
Cited By
Transformer Fusion with Optimal Transport
9 October 2023
Moritz Imfeld
Jacopo Graldi
Marco Giordano
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
ViT
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Fusion with Optimal Transport"
19 / 19 papers shown
Title
Embedding Empirical Distributions for Computing Optimal Transport Maps
Mingchen Jiang
Peng Xu
Xichen Ye
Xiaohui Chen
Yun Yang
Yifan Chen
OT
56
0
0
24 Apr 2025
Fusion of Graph Neural Networks via Optimal Transport
Weronika Ormaniec
Michael Vollenweider
Elisa Hoskovec
MoMe
FedML
OT
70
0
0
27 Mar 2025
Model Assembly Learning with Heterogeneous Layer Weight Merging
Yi-Kai Zhang
Jin Wang
Xu-Xiang Zhong
De-Chuan Zhan
Han-Jia Ye
MoMe
47
0
0
27 Mar 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
92
0
0
27 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
61
0
0
01 Feb 2025
Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization
Valentio Iverson
Stephen Vavasis
48
0
0
20 Jan 2025
Training-free Heterogeneous Model Merging
Zhengqi Xu
Han Zheng
Jie Song
Li Sun
Mingli Song
MoMe
72
1
0
03 Jan 2025
Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
Chaeyun Jang
Hyungi Lee
Jungtaek Kim
Juho Lee
MoMe
45
0
0
11 Nov 2024
A Lipschitz spaces view of infinitely wide shallow neural networks
Francesca Bartolucci
Marcello Carioni
José A. Iglesias
Yury Korolev
Emanuele Naldi
S. Vigogna
23
0
0
18 Oct 2024
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Xinyu Zhao
Guoheng Sun
Ruisi Cai
Yukun Zhou
Pingzhi Li
...
Binhang Yuan
Hongyi Wang
Ang Li
Zhangyang Wang
Tianlong Chen
MoMe
ALM
28
3
0
07 Oct 2024
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Edan Kinderman
Itay Hubara
Haggai Maron
Daniel Soudry
MoMe
49
1
0
02 Oct 2024
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
Derek Lim
Moe Putterman
Robin Walters
Haggai Maron
Stefanie Jegelka
43
5
0
30 May 2024
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
Peng Wang
Zexi Li
Ningyu Zhang
Ziwen Xu
Yunzhi Yao
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
KELM
CLL
47
20
0
23 May 2024
PopulAtion Parameter Averaging (PAPA)
Alexia Jolicoeur-Martineau
Emy Gervais
Kilian Fatras
Yan Zhang
Simon Lacoste-Julien
MoMe
40
17
0
06 Apr 2023
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
255
314
0
11 Sep 2022
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
242
45
0
24 May 2022
Optimizing Mode Connectivity via Neuron Alignment
N. Joseph Tatro
Pin-Yu Chen
Payel Das
Igor Melnyk
P. Sattigeri
Rongjie Lai
MoMe
223
80
0
05 Sep 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
249
4,489
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1