ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.07498
  4. Cited By
Visual Information Extraction in the Wild: Practical Dataset and
  End-to-end Solution

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

12 May 2023
Jianfeng Kuang
Wei Hua
Dingkang Liang
Mingkun Yang
Deqiang Jiang
Bo Ren
Xiang Bai
ArXivPDFHTML

Papers citing "Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution"

14 / 14 papers shown
Title
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
156
2
0
20 Dec 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
84
26
0
10 Oct 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Han Wang
Hao Liu
Can Huang
98
23
0
02 Jul 2024
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Jingqun Tang
Chunhui Lin
Zhen Zhao
Shubo Wei
Binghong Wu
...
Yuliang Liu
Hao Liu
Yuan Xie
Xiang Bai
Can Huang
LRM
VLM
MLLM
101
30
0
19 Apr 2024
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document
  Understanding
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang
Zhengjie Huang
Bin Luo
Qianglong Chen
Qiming Peng
...
Weichong Yin
Shi Feng
Yu Sun
Dianhai Yu
Yin Zhang
ViT
45
12
0
18 Sep 2022
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
438
2,340
0
02 Sep 2021
Scene Text Retrieval via Joint Text Detection and Similarity Learning
Scene Text Retrieval via Joint Text Detection and Similarity Learning
Hao Wang
X. Bai
Mingkun Yang
Shenggao Zhu
Jing Wang
Wenyu Liu
3DV
28
35
0
04 Apr 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik
Zongze Wu
Eli Shechtman
Daniel Cohen-Or
Dani Lischinski
CLIP
VLM
62
1,204
0
31 Mar 2021
Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Hongbin Sun
Zhanghui Kuang
Xiaoyu Yue
Chenhao Lin
Wayne Zhang
47
37
0
26 Mar 2021
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
  Spotting
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
Minghui Liao
Guan Pang
Jing Huang
Tal Hassner
X. Bai
35
182
0
18 Jul 2020
PICK: Processing Key Information Extraction from Documents using
  Improved Graph Learning-Convolutional Networks
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks
Wenwen Yu
Ning Lu
Xianbiao Qi
Ping Gong
Rong Xiao
46
136
0
16 Apr 2020
LayoutLM: Pre-training of Text and Layout for Document Image
  Understanding
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
103
694
0
31 Dec 2019
EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
He Guo
Xiameng Qin
Jiaming Liu
Junyu Han
Jingtuo Liu
Errui Ding
43
45
0
20 Sep 2019
Graph Convolution for Multimodal Information Extraction from Visually
  Rich Documents
Graph Convolution for Multimodal Information Extraction from Visually Rich Documents
Xiaojing Liu
Feiyu Gao
Qiong Zhang
Huasha Zhao
56
183
0
27 Mar 2019
1