Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.07498
Cited By
Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution
12 May 2023
Jianfeng Kuang
Wei Hua
Dingkang Liang
Mingkun Yang
Deqiang Jiang
Bo Ren
Xiang Bai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution"
14 / 14 papers shown
Title
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
156
2
0
20 Dec 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
84
26
0
10 Oct 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Han Wang
Hao Liu
Can Huang
98
23
0
02 Jul 2024
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Jingqun Tang
Chunhui Lin
Zhen Zhao
Shubo Wei
Binghong Wu
...
Yuliang Liu
Hao Liu
Yuan Xie
Xiang Bai
Can Huang
LRM
VLM
MLLM
101
30
0
19 Apr 2024
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang
Zhengjie Huang
Bin Luo
Qianglong Chen
Qiming Peng
...
Weichong Yin
Shi Feng
Yu Sun
Dianhai Yu
Yin Zhang
ViT
45
12
0
18 Sep 2022
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
438
2,340
0
02 Sep 2021
Scene Text Retrieval via Joint Text Detection and Similarity Learning
Hao Wang
X. Bai
Mingkun Yang
Shenggao Zhu
Jing Wang
Wenyu Liu
3DV
28
35
0
04 Apr 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik
Zongze Wu
Eli Shechtman
Daniel Cohen-Or
Dani Lischinski
CLIP
VLM
62
1,204
0
31 Mar 2021
Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Hongbin Sun
Zhanghui Kuang
Xiaoyu Yue
Chenhao Lin
Wayne Zhang
47
37
0
26 Mar 2021
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
Minghui Liao
Guan Pang
Jing Huang
Tal Hassner
X. Bai
35
182
0
18 Jul 2020
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks
Wenwen Yu
Ning Lu
Xianbiao Qi
Ping Gong
Rong Xiao
46
136
0
16 Apr 2020
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
103
694
0
31 Dec 2019
EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
He Guo
Xiameng Qin
Jiaming Liu
Junyu Han
Jingtuo Liu
Errui Ding
43
45
0
20 Sep 2019
Graph Convolution for Multimodal Information Extraction from Visually Rich Documents
Xiaojing Liu
Feiyu Gao
Qiong Zhang
Huasha Zhao
56
183
0
27 Mar 2019
1