Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.02304
Cited By
v1
v2 (latest)
A Token-level Text Image Foundation Model for Document Understanding
4 March 2025
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei Shen
Kai Zhou
Tiezhu Yue
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Token-level Text Image Foundation Model for Document Understanding"
44 / 94 papers shown
Title
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
371
7,405
0
05 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
251
1,205
0
27 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
191
2,023
0
09 Mar 2023
Turning a CLIP Model into a Scene Text Detector
Wenwen Yu
Yuliang Liu
Wei Hua
Deqiang Jiang
Bo Ren
Xiang Bai
VLM
CLIP
MLLM
94
58
0
28 Feb 2023
Self-supervised Character-to-Character Distillation for Text Recognition
Tongkun Guan
Wei Shen
Xuehang Yang
Qi Feng
Zekun Jiang
Xiaokang Yang
94
25
0
01 Nov 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
Ashwin Kalyan
ReLM
LRM
174
297
0
29 Sep 2022
A Survey on Label-efficient Deep Image Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction
Wei Shen
Zelin Peng
Xuehui Wang
Huayu Wang
Jiazhong Cen
Dongsheng Jiang
Lingxi Xie
Xiaokang Yang
Qi Tian
VLM
75
84
0
04 Jul 2022
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Shangbang Long
Siyang Qin
Dmitry Panteleev
Alessandro Bissacco
Yasuhisa Fujii
Michalis Raptis
72
97
0
28 Mar 2022
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Ahmed Masry
Do Xuan Long
J. Tan
Shafiq Joty
Enamul Hoque
AIMat
134
685
0
19 Mar 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
91
95
0
14 Feb 2022
OCR-free Document Understanding Transformer
Geewook Kim
Teakgyu Hong
Moonbin Yim
Jeongyeon Nam
Jinyoung Park
Jinyeong Yim
Wonseok Hwang
Sangdoo Yun
Dongyoon Han
Seunghyun Park
ViT
103
279
0
30 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
243
1,444
0
03 Nov 2021
Industrial Scene Text Detection with Refined Feature-attentive Network
Tongkun Guan
Chaochen Gu
Changsheng Lu
Jingzheng Tu
Qi Feng
Kaijie Wu
Xinping Guan
61
31
0
25 Oct 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
205
160
0
07 Aug 2021
Open Images V5 Text Annotation and Yet Another Mask Text Spotter
Ilya Krylov
S. Nosov
V. Sovrasov
VLM
68
54
0
23 Jun 2021
Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts
Tomasz Stanislawek
Filip Graliñski
Anna Wróblewska
Dawid Lipiñski
Agnieszka Kaliska
Paulina Rosalska
Bartosz Topolski
P. Biecek
77
95
0
12 May 2021
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
Amanpreet Singh
Guan Pang
Mandy Toh
Jing Huang
Wojciech Galuba
Tal Hassner
68
174
0
12 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
730
6,135
0
29 Apr 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
102
242
0
26 Apr 2021
Scene Text Retrieval via Joint Text Detection and Similarity Learning
Hao Wang
X. Bai
Mingkun Yang
Shenggao Zhu
Jing Wang
Wenyu Liu
3DV
69
36
0
04 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
993
29,871
0
26 Feb 2021
VisualMRC: Machine Reading Comprehension on Document Images
Ryota Tanaka
Kyosuke Nishida
Sen Yoshida
91
145
0
27 Jan 2021
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
151
747
0
01 Jul 2020
TableQA: a Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation
Ningyuan Sun
Xuefeng Yang
Yunfeng Liu
LMTD
68
34
0
10 Jun 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
90
418
0
24 Mar 2020
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Yuliang Liu
Hao Chen
Chunhua Shen
Tong He
Lianwen Jin
Liangwei Wang
99
335
0
24 Feb 2020
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
137
712
0
31 Dec 2019
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard
Xi Liu
Rui Zhang
Yongsheng Zhou
Qianyi Jiang
Qi Song
...
X. Bai
Baoguang Shi
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
3DV
52
160
0
20 Dec 2019
Image-based table recognition: data, model, and evaluation
Xu Zhong
Elaheh Shafieibavani
Antonio Jimeno Yepes
LMTD
92
223
0
25 Nov 2019
ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVT
Yipeng Sun
Zihan Ni
Chee-Kheng Chng
Yuliang Liu
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
100
158
0
17 Sep 2019
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
Chee-Kheng Chng
Yuliang Liu
Yipeng Sun
Chun Chet Ng
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
95
215
0
16 Sep 2019
TabFact: A Large-scale Dataset for Table-based Fact Verification
Wenhu Chen
Hongmin Wang
Jianshu Chen
Yunkai Zhang
Hong Wang
Shiyang Li
Xiyou Zhou
William Yang Wang
LMTD
109
514
0
05 Sep 2019
PlotQA: Reasoning over Scientific Plots
Nitesh Methani
Pritha Ganguly
Mitesh M. Khapra
Pratyush Kumar
111
236
0
03 Sep 2019
PubLayNet: largest dataset ever for document layout analysis
Xu Zhong
Jianbin Tang
Antonio Jimeno Yepes
52
461
0
16 Aug 2019
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
111
360
0
31 May 2019
DVQA: Understanding Data Visualizations via Question Answering
Kushal Kafle
Brian L. Price
Scott D. Cohen
Christopher Kanan
AIMat
85
397
0
24 Jan 2018
Detecting Curve Text in the Wild: New Dataset and New Solution
Liu Yuliang
Jin Lianwen
Shuaitao Zhang
Sheng Zhang
80
254
0
06 Dec 2017
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Chee-Kheng Chng
Chee Seng Chan
70
462
0
28 Oct 2017
FigureQA: An Annotated Figure Dataset for Visual Reasoning
Samira Ebrahimi Kahou
Vincent Michalski
Adam Atkinson
Ákos Kádár
Adam Trischler
Yoshua Bengio
ReLM
AIMat
57
331
0
19 Oct 2017
Focusing Attention: Towards Accurate Text Recognition in Natural Images
Zhanzhan Cheng
Fan Bai
Yunlu Xu
Gang Zheng
Shiliang Pu
Shuigeng Zhou
58
449
0
07 Sep 2017
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
Baoguang Shi
Cong Yao
Minghui Liao
Mingkun Yang
Pei Xu
Linyan Cui
Serge J. Belongie
Shijian Lu
X. Bai
53
215
0
31 Aug 2017
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
281
530
0
26 Jan 2016
Compositional Semantic Parsing on Semi-Structured Tables
Panupong Pasupat
Percy Liang
CoGe
LMTD
125
793
0
03 Aug 2015
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
Adam W. Harley
Alex Ufkes
Konstantinos G. Derpanis
120
401
0
25 Feb 2015
Previous
1
2