Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.12038
Cited By
v1
v2
v3 (latest)
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
23 August 2023
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
Qi-An Chen
Tianyu Yu
Han Wu
Yue Zhao
Haoye Zhang
Xu Han
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1058★)
Papers citing
"Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages"
34 / 34 papers shown
Title
TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for Documentaries
Jinze Lv
Jian Chen
Zi Long
Xianghua Fu
Yin Chen
VGen
132
0
0
09 May 2025
CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
Hang Yu
Jiahao Wen
Zhedong Zheng
94
1
0
26 Apr 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Yiming Lei
Chenkai Zhang
Zeming Liu
Qingjie Liu
Yansen Wang
147
2
0
28 Mar 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
203
1
0
28 Mar 2025
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Yunkai Dang
Mengxi Gao
Yibo Yan
Xin Zou
Yanggan Gu
Aiwei Liu
Xuming Hu
92
6
0
05 Nov 2024
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Shota Onohara
Atsuyuki Miyai
Yuki Imajuku
Kazuki Egashira
Jeonghun Baek
Xiang Yue
Graham Neubig
Kiyoharu Aizawa
OSLM
258
6
0
22 Oct 2024
IPO: Interpretable Prompt Optimization for Vision-Language Models
Yingjun Du
Wenfang Sun
Cees G. M. Snoek
VLM
72
3
0
20 Oct 2024
CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text
Jun Hirako
Ryohei Sasano
Koichi Takeda
107
3
0
06 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
219
37
0
04 Oct 2024
Towards Comprehensive Detection of Chinese Harmful Memes
Junyu Lu
Bo Xu
Xiaokun Zhang
Hongbo Wang
Yuanyuan Sun
Dongyu Zhang
Liang Yang
Hongfei Lin
62
0
0
03 Oct 2024
T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target Task
Xindi Tong
Yujin Zhu
Shijian Fan
Liang Xu
130
1
0
26 Sep 2024
READoc: A Unified Benchmark for Realistic Document Structured Extraction
Zichao Li
Aizier Abulaiti
Yaojie Lu
Xuanang Chen
Jia Zheng
Hongyu Lin
Xianpei Han
Le Sun
75
5
0
08 Sep 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
173
6
0
22 Aug 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
175
102
0
17 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
107
17
0
08 Jul 2024
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao
Qiuna Tan
Guanting Dong
Minhui Wu
Chong Sun
...
Yida Xu
Muxi Diao
Zhimin Bao
Chen Li
Honggang Zhang
VLM
LRM
111
56
0
01 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
247
18
0
01 Jul 2024
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang
Liang Chen
Taian Guo
Fu Zeng
Yusheng Zhao
...
Wei Ju
Luchen Liu
Tianyu Liu
Baobao Chang
Ming Zhang
176
7
0
29 Jun 2024
CELLO: Causal Evaluation of Large Vision-Language Models
Meiqi Chen
Bo Peng
Yan Zhang
Chaochao Lu
LRM
ELM
77
0
0
27 Jun 2024
GUICourse: From General Vision Language Models to Versatile GUI Agents
Wentong Chen
Junbo Cui
Jinyi Hu
Yujia Qin
Junjie Fang
...
Yupeng Huo
Yuan Yao
Yankai Lin
Zhiyuan Liu
Maosong Sun
LLMAG
160
41
0
17 Jun 2024
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
Yuhang Wu
Wenmeng Yu
Yean Cheng
Yan Wang
Xiaohan Zhang
Jiazheng Xu
Ming Ding
Yuxiao Dong
102
2
0
13 Jun 2024
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
Bang Liu
Yoshua Bengio
110
5
0
10 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Yangfu Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
142
12
0
04 Jun 2024
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners
Shimao Zhang
Changjiang Gao
Wenhao Zhu
Jiajun Chen
Xin Huang
Xue Han
Junlan Feng
Chao Deng
Shujian Huang
64
9
0
22 May 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi-dong Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Hao Liu
Xiang Bai
Can Huang
Xiang Bai
Can Huang
185
28
0
20 May 2024
LEGENT: Open Platform for Embodied Agents
Zhili Cheng
Zhitong Wang
Jinyi Hu
Shengding Hu
An Liu
Yuge Tu
Pengkai Li
Lei Shi
Zhiyuan Liu
Maosong Sun
VLM
62
7
0
28 Apr 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
159
644
0
25 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Hai-Tao Zheng
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
164
38
0
07 Apr 2024
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
Meiqi Chen
Yixin Cao
Yan Zhang
Chaochao Lu
107
16
0
27 Mar 2024
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Yusu Qian
Haotian Zhang
Yinfei Yang
Zhe Gan
198
30
0
20 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
164
216
0
24 Jan 2024
Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare
Junling Liu
Ziming Wang
Qichen Ye
Dading Chong
Peilin Zhou
Yining Hua
VLM
LM&MA
80
54
0
27 Oct 2023
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Junyu Lu
Di Zhang
Xiaojun Wu
Xinyu Gao
Ruyi Gan
Jiaxing Zhang
Yan Song
Pingjian Zhang
VLM
MLLM
55
7
0
12 Oct 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
138
611
0
23 Jun 2023
1