Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.05407
Cited By
v1
v2
v3 (latest)
Averaging Weights Leads to Wider Optima and Better Generalization
14 March 2018
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Averaging Weights Leads to Wider Optima and Better Generalization"
50 / 1,040 papers shown
Title
Variational Low-Rank Adaptation Using IVON
Bai Cong
Nico Daheim
Yuesong Shen
Daniel Cremers
Rio Yokota
Mohammad Emtiyaz Khan
Thomas Möllenhoff
94
4
0
07 Nov 2024
Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
Ayan Sengupta
Vaibhav Seth
Arinjay Pathak
Natraj Raman
Sriram Gopalakrishnan
Tanmoy Chakraborty
BDL
53
2
0
07 Nov 2024
Leveraging Transformer-Based Models for Predicting Inflection Classes of Words in an Endangered Sami Language
Khalid Alnajjar
Mika Hämäläinen
Jack Rueter
36
1
0
04 Nov 2024
Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning
Minghui Chen
Meirui Jiang
Xin Zhang
Qi Dou
Zehua Wang
Xiaoxiao Li
MoMe
FedML
128
3
0
31 Oct 2024
Where Do Large Learning Rates Lead Us?
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
104
1
0
29 Oct 2024
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
Li Shen
Anke Tang
Enneng Yang
G. Guo
Yong Luo
Lefei Zhang
Xiaochun Cao
Di Lin
Dacheng Tao
MoMe
83
9
0
29 Oct 2024
Subgraph Aggregation for Out-of-Distribution Generalization on Graphs
Bowen Liu
Haoyang Li
Shuning Wang
Shuo Nie
Shanghang Zhang
OODD
CML
170
0
0
29 Oct 2024
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
187
18
0
29 Oct 2024
Large Language Models for Cross-lingual Emotion Detection
Ram Mohan Rao Kadiyala
15
1
0
21 Oct 2024
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
Clara Na
Ian H. Magnusson
A. Jha
Tom Sherborne
Emma Strubell
Jesse Dodge
Pradeep Dasigi
MoMe
75
5
0
21 Oct 2024
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems
Bingcong Li
Liang Zhang
Niao He
93
8
0
18 Oct 2024
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery
Enneng Yang
Li Shen
Zhenyi Wang
G. Guo
Xingwei Wang
Xiaocun Cao
Jie Zhang
Dacheng Tao
MoMe
93
7
0
18 Oct 2024
Attuned to Change: Causal Fine-Tuning under Latent-Confounded Shifts
Jialin Yu
Yuxiang Zhou
Yulan He
Nevin L. Zhang
Ricardo Silva
Philip Torr
Ricardo M. A. Silva
93
0
0
18 Oct 2024
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Jinluan Yang
Anke Tang
Didi Zhu
Zhengyu Chen
Li Shen
Leilei Gan
MoMe
AAML
164
7
0
17 Oct 2024
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Akshara Prabhakar
Yuanzhi Li
Karthik Narasimhan
Sham Kakade
Eran Malach
Samy Jelassi
MoMe
101
18
0
16 Oct 2024
DAT: Improving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain
Fengpeng Li
Kemou Li
Haiwei Wu
Jinyu Tian
Jiantao Zhou
AAML
101
1
0
16 Oct 2024
Exploring Model Kinship for Merging Large Language Models
Yedi Hu
Yunzhi Yao
N. Zhang
Shumin Deng
Ningyu Zhang
MoMe
144
1
0
16 Oct 2024
Xeno-learning: knowledge transfer across species in deep learning-based spectral image analysis
Jan Sellner
Alexander Studier-Fischer
Ahmad Bin Qasim
Siyang Song
Nicholas Schreck
...
Maximilian Dietrich
Kemal Kurniawan
Felix Nickel
Karl-Friedrich Kowalewski
Lena Maier-Hein
MedIm
48
0
0
15 Oct 2024
The Epochal Sawtooth Phenomenon: Unveiling Training Loss Oscillations in Adam and Other Optimizers
Qi Liu
Wanjing Ma
86
0
0
14 Oct 2024
Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics
Daniel Paulin
Peter Whalley
Neil K. Chada
Benedict Leimkuhler
BDL
115
4
0
14 Oct 2024
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
Rui Min
Zeyu Qin
Nevin L. Zhang
Li Shen
Minhao Cheng
AAML
88
4
0
13 Oct 2024
PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency
Preferred Elements
:
Kenshin Abe
Kaizaburo Chubachi
Yasuhiro Fujita
...
Yoshihiko Ozaki
Shotaro Sano
Shuji Suzuki
Tianqi Xu
Toshihiko Yanase
87
0
0
10 Oct 2024
Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion
Bowen Tian
Songning Lai
Yutao Yue
MoMe
74
0
0
08 Oct 2024
QT-DoG: Quantization-aware Training for Domain Generalization
Saqib Javed
Hieu Le
Mathieu Salzmann
OOD
MQ
121
2
0
08 Oct 2024
Improving Generalization with Flat Hilbert Bayesian Inference
Tuan Truong
Quyen Tran
Quan Pham-Ngoc
Nhat Ho
Dinh Q. Phung
T. Le
71
1
0
05 Oct 2024
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
Changdae Oh
Yixuan Li
Kyungwoo Song
Sangdoo Yun
Dongyoon Han
OOD
MoMe
91
10
0
03 Oct 2024
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Edan Kinderman
Itay Hubara
Haggai Maron
Daniel Soudry
MoMe
98
2
0
02 Oct 2024
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar
Benjamin Muller
Pritish Yuvraj
Rui Hou
Nayan Singhal
Hongjiang Lv
Bing-Quan Liu
KELM
LRM
MoMe
144
5
0
02 Oct 2024
Characterizing Model Robustness via Natural Input Gradients
Adrian Rodriguez-Munoz
Tongzhou Wang
Antonio Torralba
AAML
87
1
0
30 Sep 2024
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks
Roberto Alcover-Couso
Juan C. Sanmiguel
Marcos Escudero-Viñolo
Jose M. Martínez
FedML
MoMe
64
1
0
24 Sep 2024
Mastering Chess with a Transformer Model
Daniel Monroe
The Leela Chess Zero Team
69
3
0
18 Sep 2024
Diminishing Domain Mismatch for DNN-Based Acoustic Distance Estimation via Stochastic Room Reverberation Models
Tobias Gburrek
Adrian Meise
Joerg Schmalenstroeer
Reinhold Haeb-Umbach
64
0
0
26 Aug 2024
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
T. Dao
Thuan Hoang Nguyen
T. Le
D. Vu
Khoi Nguyen
Cuong Pham
Anh Tran
DiffM
121
19
0
26 Aug 2024
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
Anke Tang
Li Shen
Yong Luo
Shuai Xie
Han Hu
Lefei Zhang
Di Lin
Dacheng Tao
MoMe
104
4
0
19 Aug 2024
The First Competition on Resource-Limited Infrared Small Target Detection Challenge: Methods and Results
Boyang Li
Xinyi Ying
Ruojing Li
Yongxian Liu
Yangsi Shi
Miao Li
89
1
0
18 Aug 2024
Activated Parameter Locating via Causal Intervention for Model Merging
Fanshuang Kong
Richong Zhang
Ziqiao Wang
MoMe
42
2
0
18 Aug 2024
Narrowing the Focus: Learned Optimizers for Pretrained Models
Gus Kristiansen
Mark Sandler
A. Zhmoginov
Nolan Miller
Anirudh Goyal
Jihwan Lee
Max Vladymyrov
86
1
0
17 Aug 2024
ADformer: A Multi-Granularity Transformer for EEG-Based Alzheimer's Disease Assessment
Yihe Wang
Nadia Mammone
Darina Petrovsky
Alexandros T. Tzallas
Francesco C. Morabito
Xiang Zhang
MedIm
75
3
0
17 Aug 2024
Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius
Xuehao Wang
Weisen Jiang
Shuai Fu
Yu Zhang
AAML
81
0
0
15 Aug 2024
BadMerging: Backdoor Attacks Against Model Merging
Jinghuai Zhang
Jianfeng Chi
Zheng Li
Kunlin Cai
Yang Zhang
Yuan Tian
MoMe
FedML
AAML
112
18
0
14 Aug 2024
Efficient and Versatile Robust Fine-Tuning of Zero-shot Models
Sungyeon Kim
Boseung Jeong
Donghyun Kim
Suha Kwak
VLM
88
3
0
11 Aug 2024
What comes after transformers? -- A selective survey connecting ideas in deep learning
Johannes Schneider
AI4CE
112
2
0
01 Aug 2024
SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation
C. Termritthikun
Ayaz Umer
Suwichaya Suwanwimolkul
Xiwei Xu
Ivan Lee
71
5
0
29 Jul 2024
Deep Companion Learning: Enhancing Generalization Through Historical Consistency
Ruizhao Zhu
Venkatesh Saligrama
FedML
87
0
0
26 Jul 2024
Training-Free Model Merging for Multi-target Domain Adaptation
Wenyi Li
Huan-ang Gao
Mingju Gao
Beiwen Tian
Rong Zhi
Hao Zhao
MoMe
94
8
0
18 Jul 2024
Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance
Haiquan Lu
Xiaotian Liu
Yefan Zhou
Qunli Li
Kurt Keutzer
Michael W. Mahoney
Yujun Yan
Huanrui Yang
Yaoqing Yang
64
1
0
17 Jul 2024
ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment
Xinyi Wang
Angeliki V. Katsenou
David Bull
142
1
0
16 Jul 2024
PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition
Xiao-Li Li
Yining Liu
Na Dong
Sitian Qin
Xiaolin Hu
85
4
0
15 Jul 2024
Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models
Rining Wu
Feixiang Zhou
Ziwei Yin
Jian K. Liu
70
0
0
15 Jul 2024
Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design
Natavsa Tagasovska
Ji Won Park
Matthieu Kirchmeyer
Nathan C. Frey
Andrew Watkins
...
Arian R. Jamasb
Edith Lee
Tyler Bryson
Stephen Ra
Kyunghyun Cho
OOD
127
6
0
15 Jul 2024
Previous
1
2
3
4
5
6
...
19
20
21
Next