Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.12709
Cited By
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
17 July 2024
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models"
12 / 12 papers shown
Title
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
90
33
0
10 Mar 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Wei Li
Bing Hu
Rui Shao
Leyang Shen
Liqiang Nie
41
2
0
05 Mar 2025
FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration
Hao Li
Xiang Chen
Jiangxin Dong
Jinhui Tang
Jinshan Pan
73
2
0
02 Dec 2024
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
Zaijing Li
Yuquan Xie
Rui Shao
Gongwei Chen
Dongmei Jiang
Liqiang Nie
54
18
0
07 Aug 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
55
3
0
07 Mar 2024
Multimodal Instruction Tuning with Conditional Mixture of LoRA
Ying Shen
Zhiyang Xu
Qifan Wang
Yu Cheng
Wenpeng Yin
Lifu Huang
42
13
0
24 Feb 2024
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen
Leyang Shen
Rui Shao
Xiang Deng
Liqiang Nie
VLM
MLLM
67
42
0
20 Nov 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
440
0
14 Oct 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Ji Zhang
Qin Jin
Liang He
Xin Lin
Feiyan Huang
VLM
MLLM
126
84
0
08 Oct 2023
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
169
263
0
07 Oct 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
149
638
0
26 May 2022
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
87
5,284
0
03 Nov 2016
1