Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.00968
Cited By
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
1 December 2023
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts"
11 / 11 papers shown
Title
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
Yimu Wang
Mozhgan Nasr Azadani
Sean Sedwards
Krzysztof Czarnecki
MLLM
MoE
52
0
0
07 Apr 2025
Where is this coming from? Making groundedness count in the evaluation of Document VQA models
Armineh Nourbakhsh
Siddharth Parekh
Pranav Shetty
Zhao Jin
Sameena Shah
Carolyn Rose
48
0
0
24 Mar 2025
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
104
2
0
26 Feb 2025
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chunbai Zhang
Chao Wang
Yang Zhou
Yan Peng
LRM
ReLM
62
0
0
02 Feb 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
102
48
0
03 Jan 2025
Revisiting Multi-Modal LLM Evaluation
Jian Lu
Shikhar Srivastava
Junyu Chen
Robik Shrestha
Manoj Acharya
Kushal Kafle
Christopher Kanan
30
3
0
09 Aug 2024
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Victor Carbune
Hassan Mansoor
Fangyu Liu
Rahul Aralikatte
Gilles Baechler
Jindong Chen
Abhanshu Sharma
ReLM
LRM
144
12
0
19 Mar 2024
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Gilles Baechler
Srinivas Sunkara
Maria Wang
Fedir Zubach
Hassan Mansoor
Vincent Etter
Victor Carbune
Jason Lin
Jindong Chen
Abhanshu Sharma
123
47
0
07 Feb 2024
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
287
4,261
0
30 Jan 2023
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,858
0
18 Apr 2021
1