Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.04160
Cited By
v1
v2
v3 (latest)
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
7 May 2023
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages"
45 / 95 papers shown
Title
Model Composition for Multimodal Large Language Models
Chi Chen
Yiyang Du
Zheng Fang
Ziyue Wang
Ziyue Wang
...
Ming Yan
Ji Zhang
Fei Huang
Maosong Sun
Yang Liu
MoMe
80
3
0
20 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
137
64
0
19 Feb 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
133
18
0
19 Feb 2024
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
99
22
0
08 Feb 2024
User Intent Recognition and Satisfaction with Large Language Models: A User Study with ChatGPT
Anna Bodonhelyi
Efe Bozkir
Shuo Yang
Enkelejda Kasneci
Gjergji Kasneci
ELM
AI4MH
67
19
0
03 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
164
217
0
24 Jan 2024
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Ruizhe Li
Chao Zhang
Pin-Yu Chen
Ensiong Chng
97
25
0
19 Jan 2024
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
...
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
130
44
0
11 Jan 2024
Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education
Arne Bewersdorff
Christian Hartmann
Marie Hornberger
Kathrin Seßler
Maria Bannert
Enkelejda Kasneci
Gjergji Kasneci
Xiaoming Zhai
Claudia Nerdel
121
37
0
01 Jan 2024
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
88
26
0
17 Dec 2023
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Yukiya Hono
Koh Mitsuda
Tianyu Zhao
Kentaro Mitsui
Toshiaki Wakatsuki
Kei Sawada
AuLLM
84
8
0
06 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
151
61
0
30 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
83
66
0
27 Nov 2023
InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery
He Cao
Zijing Liu
Xingyu Lu
Yuan Yao
Yu-Feng Li
112
68
0
27 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
385
711
0
16 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
90
12
0
14 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
148
249
0
14 Nov 2023
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li
Biao Yang
Qiang Liu
Zhiyin Ma
Shuo Zhang
Jingxu Yang
Yabo Sun
Yuliang Liu
Xiang Bai
MLLM
135
278
0
11 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
83
3
0
08 Nov 2023
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models
Eren Unlu
Unver Ciftci
64
0
0
27 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
115
264
0
20 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
91
58
0
13 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
105
13
0
09 Oct 2023
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
89
12
0
05 Oct 2023
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
Yue Huang
Jiawen Shi
Yuan Li
Chenrui Fan
Siyuan Wu
...
Yixin Liu
Pan Zhou
Yao Wan
Neil Zhenqiang Gong
Lichao Sun
LLMAG
121
96
0
04 Oct 2023
Tuning Large language model for End-to-end Speech Translation
Hao Zhang
Nianwen Si
Yaqi Chen
Wenlin Zhang
Xu Yang
Dan Qu
Xiaolin Jiao
99
8
0
03 Oct 2023
Connecting Speech Encoder and Large Language Model for ASR
Wenyi Yu
Changli Tang
Guangzhi Sun
Xianzhao Chen
T. Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
AuLLM
77
77
0
25 Sep 2023
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jinliang Lu
Junhong Wu
Yuchen Liu
Chengqing Zong
Jiajun Zhang
AuLLM
131
45
0
02 Sep 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
120
56
0
23 Aug 2023
Federated Learning in Big Model Era: Domain-Specific Multimodal Large Models
Zengxiang Li
Zhaoxiang Hou
Hui Liu
Ying Wang
Tongzhi Li
...
Chao Shi
Che-Sheng Yang
Weishan Zhang
Zelei Liu
Liang Xu
FedML
49
2
0
22 Aug 2023
Tackling Vision Language Tasks Through Learning Inner Monologues
Diji Yang
Kezhen Chen
Jinmeng Rao
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yize Zhang
MLLM
99
11
0
19 Aug 2023
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text
Peng Liu
Yiming Ren
Jun Tao
Zhixiang Ren
AI4CE
108
85
0
14 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
84
27
0
03 Aug 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
84
112
0
17 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
104
31
0
13 Jul 2023
On decoder-only architecture for speech-to-text and large language model integration
Jian Wu
Yashesh Gaur
Zhuo Chen
Long Zhou
Yilun Zhu
...
Jinyu Li
Shujie Liu
Bo Ren
Linquan Liu
Yu-Huan Wu
AuLLM
132
136
0
08 Jul 2023
Embodied Task Planning with Large Language Models
Zhenyu Wu
Ziwei Wang
Xiuwei Xu
Jiwen Lu
Haibin Yan
LM&Ro
LLMAG
81
76
0
04 Jul 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
155
653
0
27 Jun 2023
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Chunyuan Li
MLLM
VLM
106
20
0
26 Jun 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
140
615
0
23 Jun 2023
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest Cost
Juexiao Zhou
Preslav Nakov
Xin Gao
LM&MA
AI4CE
139
12
0
19 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLM
AuLLM
86
173
0
15 Jun 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Bin Wang
Jifeng Dai
Yu Qiao
Ping Luo
LM&Ro
LRM
114
245
0
24 May 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLM
MLLM
136
344
0
18 May 2023
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
Minglun Han
Feilong Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
100
13
0
30 Jan 2023
Previous
1
2