Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16470
Cited By
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering
22 May 2025
Kuicai Dong
Yujing Chang
Shijie Huang
Yasheng Wang
Ruiming Tang
Yong Liu
Author Contacts:
kuicai.dong@huawei.com
liu.yong6@huawei.com
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering"
15 / 15 papers shown
Title
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
158
89
1
14 Apr 2025
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Jaemin Cho
Debanjan Mahata
Ozan Irsoy
Yujie He
Joey Tianyi Zhou
VLM
75
15
0
07 Nov 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
97
14
0
14 Oct 2024
MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question Answering
Zhengyuan Zhu
Daniel Lee
Hong Zhang
Sai Sree Harsha
Loic Feujio
Akash Maharaj
Yunyao Li
58
3
0
16 Aug 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
103
131
0
22 Apr 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
138
1,119
0
05 Feb 2024
Hierarchical multimodal transformers for Multi-Page DocVQA
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
66
59
0
07 Dec 2022
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Liang Wang
Nan Yang
Xiaolong Huang
Binxing Jiao
Linjun Yang
Daxin Jiang
Rangan Majumder
Furu Wei
VLM
241
601
0
07 Dec 2022
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
Shitao Xiao
Zheng Liu
Yingxia Shao
Bo Zhao
RALM
264
124
0
24 May 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
92
454
0
18 Apr 2022
Unsupervised Dense Information Retrieval with Contrastive Learning
Gautier Izacard
Mathilde Caron
Lucas Hosseini
Sebastian Riedel
Piotr Bojanowski
Armand Joulin
Edouard Grave
RALM
195
907
0
16 Dec 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
98
231
0
26 Apr 2021
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
139
718
0
01 Jul 2020
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Omar Khattab
Matei A. Zaharia
138
1,370
0
27 Apr 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
1