Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.09297
Cited By
Transferring General Multimodal Pretrained Models to Text Recognition
19 December 2022
Junyang Lin
Xuancheng Ren
Yichang Zhang
Gao Liu
Peng Wang
An Yang
Chang Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transferring General Multimodal Pretrained Models to Text Recognition"
28 / 28 papers shown
Title
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
131
3,355
0
16 Oct 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
133
636
0
22 Aug 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
129
397
0
17 Jun 2022
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining
Pengyuan Lyu
Chengquan Zhang
Shanshan Liu
Meina Qiao
Yangliu Xu
Liang Wu
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
70
43
0
01 Jun 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
130
865
0
07 Feb 2022
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
Haiyang Yu
Jingye Chen
Bin Li
Jianqi Ma
Mengnan Guan
Xixi Xu
Xiaocong Wang
Shaobo Qu
Xiangyang Xue
39
55
0
30 Dec 2021
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Minghao Li
Tengchao Lv
Jingye Chen
Lei Cui
Yijuan Lu
D. Florêncio
Cha Zhang
Zhoujun Li
Furu Wei
ViT
189
351
0
21 Sep 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
108
789
0
24 Aug 2021
Vision Transformer for Fast and Efficient Scene Text Recognition
Rowel Atienza
ViT
53
146
0
18 May 2021
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Shancheng Fang
Hongtao Xie
Yuxin Wang
Zhendong Mao
Yongdong Zhang
53
302
0
11 Mar 2021
M6: A Chinese Multimodal Pretrainer
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
...
Yong Li
Wei Lin
Jingren Zhou
J. Tang
Hongxia Yang
VLM
MoE
62
133
0
01 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
763
28,659
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
460
40,217
0
22 Oct 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
318
12,906
0
26 May 2020
Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
Deli Yu
Xuan Li
Chengquan Zhang
Junyu Han
Jingtuo Liu
Errui Ding
75
286
0
27 Mar 2020
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard
Xi Liu
Rui Zhang
Yongsheng Zhou
Qianyi Jiang
Qi Song
...
X. Bai
Baoguang Shi
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
3DV
48
153
0
20 Dec 2019
ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVT
Yipeng Sun
Zihan Ni
Chee-Kheng Chng
Yuliang Liu
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
87
155
0
17 Sep 2019
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
Chee-Kheng Chng
Yuliang Liu
Yipeng Sun
Chun Chet Ng
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
74
211
0
16 Sep 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
210
3,659
0
06 Aug 2019
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
Hui Li
Peng Wang
Chunhua Shen
Guyu Zhang
58
376
0
02 Nov 2018
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
Baoguang Shi
Cong Yao
Minghui Liao
Mingkun Yang
Pei Xu
Linyan Cui
Serge J. Belongie
Shijian Lu
X. Bai
32
210
0
31 Aug 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
519
129,831
0
12 Jun 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
309
3,187
0
02 Dec 2016
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
119
1,250
0
31 Jul 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
190
5,706
0
23 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.6K
192,638
0
10 Dec 2015
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Baoguang Shi
X. Bai
Cong Yao
VLM
170
2,473
0
21 Jul 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
180
2,461
0
01 Apr 2015
1