ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14014
  4. Cited By
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained
  Vision-Language Model
v1v2v3 (latest)

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

23 May 2023
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
    CLIPVLM
ArXiv (abs)PDFHTML

Papers citing "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model"

50 / 57 papers shown
Title
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
Feng Xiao
Hongbin Xu
Guocan Zhao
Wenxiong Kang
214
0
0
07 May 2025
Scene Text Recognition Models Explainability Using Local Features
Scene Text Recognition Models Explainability Using Local Features
M. Ty
Rowel Atienza
63
1
0
14 Oct 2023
Data Filtering Networks
Data Filtering Networks
Alex Fang
Albin Madappally Jose
Amit Jain
Ludwig Schmidt
Alexander Toshev
Vaishaal Shankar
CLIP
94
144
0
29 Sep 2023
LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Changxu Cheng
Peng Wang
Cheng Da
Qi Zheng
Cong Yao
73
15
0
24 Aug 2023
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
200
3,493
0
16 Oct 2022
Scene Text Recognition with Permuted Autoregressive Sequence Models
Scene Text Recognition with Permuted Autoregressive Sequence Models
Darwin Bautista
Rowel Atienza
102
173
0
14 Jul 2022
Reading and Writing: Discriminative and Generative Modeling for
  Self-Supervised Text Recognition
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang
Minghui Liao
Pu Lu
Jing Wang
Shenggao Zhu
Hualin Luo
Qingzhen Tian
X. Bai
SSL
98
59
0
01 Jul 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
  Learning
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
97
244
0
13 Jun 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLMCLIPOffRL
169
1,307
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,602
0
29 Apr 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression
  Comprehension
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Sanjay Subramanian
William Merrill
Trevor Darrell
Matt Gardner
Sameer Singh
Anna Rohrbach
ObjD
105
128
0
12 Apr 2022
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual
  Entailment
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
Haoyu Song
Li Dong
Weinan Zhang
Ting Liu
Furu Wei
VLMCLIP
81
139
0
14 Mar 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLMObjD
154
880
0
07 Feb 2022
Language-driven Semantic Segmentation
Language-driven Semantic Segmentation
Boyi Li
Kilian Q. Weinberger
Serge Belongie
V. Koltun
René Ranftl
VLM
124
625
0
10 Jan 2022
VL-Adapter: Parameter-Efficient Transfer Learning for
  Vision-and-Language Tasks
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLMVPVLM
112
356
0
13 Dec 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
141
908
0
22 Nov 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLMCLIP
108
642
0
09 Nov 2021
Towards artificial general intelligence via a multimodal foundation
  model
Towards artificial general intelligence via a multimodal foundation model
Nanyi Fei
Zhiwu Lu
Yizhao Gao
Guoxing Yang
Yuqi Huo
...
Ruihua Song
Xin Gao
Tao Xiang
Haoran Sun
Jiling Wen
AI4CELRM
84
227
0
27 Oct 2021
Supervision Exists Everywhere: A Data Efficient Contrastive
  Language-Image Pre-training Paradigm
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLMCLIP
152
458
0
11 Oct 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLMCLIP
309
1,045
0
09 Oct 2021
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image
  Manipulation
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Gwanghyun Kim
Taesung Kwon
Jong Chul Ye
DiffM
200
655
0
06 Oct 2021
TrOCR: Transformer-based Optical Character Recognition with Pre-trained
  Models
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Minghao Li
Tengchao Lv
Jingye Chen
Lei Cui
Yijuan Lu
D. Florêncio
Cha Zhang
Zhoujun Li
Furu Wei
ViT
240
370
0
21 Sep 2021
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLMCLIPVLM
505
2,409
0
02 Sep 2021
From Two to One: A New Scene Text Recognizer with Visual Language
  Modeling Network
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
85
154
0
22 Aug 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
221
1,972
0
16 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIPVLMMLLM
257
410
0
13 Jul 2021
Open Images V5 Text Annotation and Yet Another Mask Text Spotter
Open Images V5 Text Annotation and Yet Another Mask Text Spotter
Ilya Krylov
S. Nosov
V. Sovrasov
VLM
68
54
0
23 Jun 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
289
2,841
0
15 Jun 2021
Representation and Correlation Enhanced Encoder-Decoder Framework for
  Scene Text Recognition
Representation and Correlation Enhanced Encoder-Decoder Framework for Scene Text Recognition
Meng Cui
Wei Wang
Jinjin Zhang
Liang Wang
3DV
76
12
0
13 Jun 2021
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped
  scene text
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
Amanpreet Singh
Guan Pang
Mandy Toh
Jing Huang
Wojciech Galuba
Tal Hassner
64
174
0
12 May 2021
ImageNet-21K Pretraining for the Masses
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSegVLMCLIP
324
711
0
22 Apr 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
150
1,584
0
18 Apr 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik
Zongze Wu
Eli Shechtman
Daniel Cohen-Or
Dani Lischinski
CLIPVLM
129
1,209
0
31 Mar 2021
Read Like Humans: Autonomous, Bidirectional and Iterative Language
  Modeling for Scene Text Recognition
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Shancheng Fang
Hongtao Xie
Yuxin Wang
Zhendong Mao
Yongdong Zhang
78
306
0
11 Mar 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
459
3,893
0
11 Feb 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region
  Supervision
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLMCLIP
131
1,757
0
05 Feb 2021
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text
  Recognition
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition
Zhi Qiao
Yu Zhou
Dongbao Yang
Yucan Zhou
Weiping Wang
71
228
0
22 May 2020
Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
Deli Yu
Xuan Li
Chengquan Zhang
Junyu Han
Jingtuo Liu
Errui Ding
95
287
0
27 Mar 2020
TextScanner: Reading Characters in Order for Robust Scene Text
  Recognition
TextScanner: Reading Characters in Order for Robust Scene Text Recognition
Zhaoyi Wan
Minghang He
Haoran Chen
X. Bai
Cong Yao
67
139
0
28 Dec 2019
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard
Xi Liu
Rui Zhang
Yongsheng Zhou
Qianyi Jiang
Qi Song
...
X. Bai
Baoguang Shi
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
3DV
52
160
0
20 Dec 2019
RandAugment: Practical automated data augmentation with a reduced search
  space
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
258
3,502
0
30 Sep 2019
ICDAR 2019 Competition on Large-scale Street View Text with Partial
  Labeling -- RRC-LSVT
ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVT
Yipeng Sun
Zihan Ni
Chee-Kheng Chng
Yuliang Liu
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
100
158
0
17 Sep 2019
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
Chee-Kheng Chng
Yuliang Liu
Yipeng Sun
Chun Chet Ng
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
92
215
0
16 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
677
24,541
0
26 Jul 2019
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection
  and Recognition -- RRC-MLT-2019
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019
Nibal Nayef
Yash J. Patel
M. Busta
Pinaki Nath Chowdhury
Dimosthenis Karatzas
...
Jirí Matas
Umapada Pal
J. Burie
Cheng-Lin Liu
J. Ogier
3DV
78
251
0
01 Jul 2019
Scene Text Detection and Recognition: The Deep Learning Era
Scene Text Detection and Recognition: The Deep Learning Era
Shangbang Long
Xin He
Cong Yao
VLM
113
398
0
10 Nov 2018
Ray: A Distributed Framework for Emerging AI Applications
Ray: A Distributed Framework for Emerging AI Applications
Philipp Moritz
Robert Nishihara
Stephanie Wang
Alexey Tumanov
Richard Liaw
...
Melih Elibol
Zongheng Yang
William Paul
Michael I. Jordan
Ion Stoica
GNN
107
1,267
0
16 Dec 2017
Focusing Attention: Towards Accurate Text Recognition in Natural Images
Focusing Attention: Towards Accurate Text Recognition in Natural Images
Zhanzhan Cheng
Fan Bai
Yunlu Xu
Gang Zheng
Shiliang Pu
Shuigeng Zhou
56
449
0
07 Sep 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,685
0
08 Jun 2017
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based
  Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
325
20,086
0
07 Oct 2016
12
Next