Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.03420
Cited By
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
5 September 2024
Anwen Hu
Haiyang Xu
Liang Zhang
Jiabo Ye
Ming Yan
Ji Zhang
Qin Jin
Fei Huang
Jingren Zhou
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding"
11 / 11 papers shown
Title
Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?
Haibin He
Maoyuan Ye
Jing Zhang
Xiantao Cai
Juhua Liu
Bo Du
Dacheng Tao
LRM
4
0
0
19 May 2025
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang
Wenliang Zheng
Aashrith Madasu
Peng Shi
Ryo Kamoi
...
Ranran Haoran Zhang
Avitej Iyer
Renze Lou
Wenpeng Yin
Rui Zhang
68
0
0
25 Apr 2025
Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts
Xiangnan Chen
Yuancheng Fang
Qian Xiao
Juncheng Billy Li
J. Lin
Siliang Tang
Yi Yang
Yueting Zhuang
70
0
0
06 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
45
1
0
04 Mar 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
105
4
0
12 Feb 2025
Vision-centric Token Compression in Large Language Model
Ling Xing
Alex Jinpeng Wang
Rui Yan
Xiangbo Shu
Jinhui Tang
VLM
62
0
0
02 Feb 2025
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang
Yuan Qu
Hongbin Zhou
Jiawei Zhu
Rui Zhang
...
Chao Xu
Bo Zhang
Botian Shi
Zhongying Tu
Zeang Sheng
101
5
0
10 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
M. Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
90
4
0
08 Dec 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
39
23
0
14 Oct 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
126
379
0
07 Nov 2023
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
169
263
0
07 Oct 2022
1