SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement

SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement

16 June 2025

ArXiv (abs)PDF HTML

Papers citing "SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement"

14 / 14 papers shown

Title
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents Ryota Tanaka Taichi Iki Taku Hasegawa Kyosuke Nishida Kuniko Saito Jun Suzuki VLM 96 6 0 14 Apr 2025
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation Pei Liu Xin Liu Ruoyu Yao Junming Liu Siyuan Meng Ding Wang Jun Ma 3DV VLM 428 4 0 13 Apr 2025
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding S. Han Peng Xia Ruiyi Zhang Tong Sun Yun Li Hongtu Zhu Huaxiu Yao VLM 175 7 0 18 Mar 2025
From Local to Global: A Graph RAG Approach to Query-Focused Summarization Darren Edge Ha Trinh Newman Cheng Joshua Bradley Alex Chao Apurva Mody Steven Truitt Dasha Metropolitansky Robert Osazuwa Ness Jonathan Larson RALM 250 437 0 20 Feb 2025
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Manan Suri Puneet Mathur Franck Dernoncourt Kanika Goswami Ryan Rossi Dinesh Manocha 131 5 0 14 Dec 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations Yubo Ma Yuhang Zang Liangyu Chen Meiqi Chen Yizhu Jiao ... Liangming Pan Yu-Gang Jiang Jiaqi Wang Yixin Cao Aixin Sun ELM RALM VLM 91 33 0 01 Jul 2024
Multilingual E5 Text Embeddings: A Technical Report Liang Wang Nan Yang Xiaolong Huang Linjun Yang Rangan Majumder Furu Wei 63 133 0 08 Feb 2024
Hierarchical multimodal transformers for Multi-Page DocVQA Rubèn Pérez Tito Dimosthenis Karatzas Ernest Valveny 73 60 0 07 Dec 2022
V-Doc : Visual questions answers with Documents Yihao Ding Zhe Huang Runlin Wang Yanhang Zhang Xianru Chen Yuzhong Ma Hyunsuk Chung S. Han 53 16 0 27 May 2022
Unified Pretraining Framework for Document Understanding Jiuxiang Gu Jason Kuen Vlad I. Morariu Handong Zhao Nikolaos Barmpalios R. Jain A. Nenkova Tong Sun 78 97 0 22 Apr 2022
DocFormer: End-to-End Transformer for Document Understanding Srikar Appalaraju Bhavan A. Jasani Bhargava Urala Kota Yusheng Xie R. Manmatha ViT 88 279 0 22 Jun 2021
DocVQA: A Dataset for VQA on Document Images Minesh Mathew Dimosthenis Karatzas C. V. Jawahar 142 743 0 01 Jul 2020
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT Omar Khattab Matei A. Zaharia 138 1,376 0 27 Apr 2020
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering Zhilin Yang Peng Qi Saizheng Zhang Yoshua Bengio William W. Cohen Ruslan Salakhutdinov Christopher D. Manning RALM 188 2,689 0 25 Sep 2018