ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.07464
  4. Cited By
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

18 January 2023
Aviad Aberdam
David Bensaid
Alona Golts
Roy Ganz
Oren Nuriel
Royee Tichauer
Shai Mazor
Ron Litman
    VLM
    CLIP
ArXivPDFHTML

Papers citing "CLIPTER: Looking at the Bigger Picture in Scene Text Recognition"

50 / 57 papers shown
Title
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
79
716
0
14 Sep 2022
Out-of-Vocabulary Challenge Report
Out-of-Vocabulary Challenge Report
Sergi Garcia-Bordils
Andrés Mafla
Ali Furkan Biten
Oren Nuriel
Aviad Aberdam
Shai Mazor
Ron Litman
Dimosthenis Karatzas
35
16
0
14 Sep 2022
GLASS: Global to Local Attention for Scene-Text Spotting
GLASS: Global to Local Attention for Scene-Text Spotting
Roi Ronen
Shahar Tsiper
Oron Anschel
I. Lavi
Amir Markovitz
R. Manmatha
41
44
0
05 Aug 2022
Scene Text Recognition with Permuted Autoregressive Sequence Models
Scene Text Recognition with Permuted Autoregressive Sequence Models
Darwin Bautista
Rowel Atienza
70
172
0
14 Jul 2022
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining
Pengyuan Lyu
Chengquan Zhang
Shanshan Liu
Meina Qiao
Yangliu Xu
Liang Wu
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
70
43
0
01 Jun 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
123
546
0
27 May 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
81
312
0
12 May 2022
Multimodal Semi-Supervised Learning for Text Recognition
Multimodal Semi-Supervised Learning for Text Recognition
Aviad Aberdam
Roy Ganz
Shai Mazor
Ron Litman
VLM
63
19
0
08 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
320
3,515
0
29 Apr 2022
Pushing the Performance Limit of Scene Text Recognizer without Human
  Annotation
Pushing the Performance Limit of Scene Text Recognizer without Human Annotation
Caiyuan Zheng
Hui Li
Seon-Min Rhee
Seungju Han
Jae-Joon Han
Peng Wang
63
12
0
16 Apr 2022
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Shangbang Long
Siyang Qin
Dmitry Panteleev
Alessandro Bissacco
Yasuhisa Fujii
Michalis Raptis
51
95
0
28 Mar 2022
SimAN: Exploring Self-Supervised Representation Learning of Scene Text
  via Similarity-Aware Normalization
SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization
Canjie Luo
Lianwen Jin
Jingdong Chen
SSL
AI4TS
51
30
0
20 Mar 2022
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
  Detection and Text Recognition
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Mingxin Huang
Yuliang Liu
Zhenghao Peng
Chongyu Liu
Dahua Lin
Shenggao Zhu
N. Yuan
Kai Ding
Lianwen Jin
ViT
35
102
0
19 Mar 2022
Language Matters: A Weakly Supervised Vision-Language Pre-training
  Approach for Scene Text Detection and Spotting
Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
Chuhui Xue
Wenqing Zhang
Yu Hao
Shijian Lu
Philip Torr
Song Bai
VLM
53
32
0
08 Mar 2022
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Yair Kittenplon
I. Lavi
Sharon Fogel
Yarin Bar
R. Manmatha
Pietro Perona
ViT
32
53
0
11 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
490
4,324
0
28 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
73
101
0
23 Dec 2021
Multi-modal Text Recognition Networks: Interactive Enhancements between
  Visual and Semantic Features
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features
Byeonghu Na
Yoonsik Kim
Sungrae Park
57
54
0
30 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLM
VLM
112
248
0
24 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
427
7,705
0
11 Nov 2021
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Prajjwal Bhargava
Aleksandr Drozd
Anna Rogers
132
105
0
04 Oct 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
112
792
0
24 Aug 2021
From Two to One: A New Scene Text Recognizer with Visual Language
  Modeling Network
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
75
153
0
22 Aug 2021
Towards the Unseen: Iterative Text Recognition by Distilling from Errors
Towards the Unseen: Iterative Text Recognition by Distilling from Errors
A. Bhunia
Pinaki Nath Chowdhury
Aneeshan Sain
Yi-Zhe Song
55
16
0
26 Jul 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
167
1,943
0
16 Jul 2021
DocFormer: End-to-End Transformer for Document Understanding
DocFormer: End-to-End Transformer for Document Understanding
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
70
275
0
22 Jun 2021
Vision Transformer for Fast and Efficient Scene Text Recognition
Vision Transformer for Fast and Efficient Scene Text Recognition
Rowel Atienza
ViT
65
148
0
18 May 2021
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped
  scene text
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
Amanpreet Singh
Guan Pang
Mandy Toh
Jing Huang
Wojciech Galuba
Tal Hassner
54
171
0
12 May 2021
ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text
  Spotting
ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting
Yuliang Liu
Chunhua Shen
Lianwen Jin
Tong He
Peng Chen
Chongyu Liu
Hao Chen
56
138
0
08 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
611
6,029
0
29 Apr 2021
Read Like Humans: Autonomous, Bidirectional and Iterative Language
  Modeling for Scene Text Recognition
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Shancheng Fang
Hongtao Xie
Yuxin Wang
Zhendong Mao
Yongdong Zhang
60
305
0
11 Mar 2021
What If We Only Use Real Datasets for Scene Text Recognition? Toward
  Scene Text Recognition With Fewer Labels
What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels
Jeonghun Baek
Yusuke Matsui
Kiyoharu Aizawa
70
91
0
07 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
810
29,167
0
26 Feb 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
304
157
0
02 Jan 2021
On Calibration of Scene-Text Recognition Models
On Calibration of Scene-Text Recognition Models
Ron Slossberg
Oron Anschel
Amir Markovitz
Ron Litman
Aviad Aberdam
Shahar Tsiper
Shai Mazor
Jon Wu
R. Manmatha
43
13
0
23 Dec 2020
Sequence-to-Sequence Contrastive Learning for Text Recognition
Sequence-to-Sequence Contrastive Learning for Text Recognition
Aviad Aberdam
Ron Litman
Shahar Tsiper
Oron Anschel
Ron Slossberg
Shai Mazor
R. Manmatha
Pietro Perona
70
107
0
20 Dec 2020
MANGO: A Mask Attention Guided One-Stage Scene Text Spotter
MANGO: A Mask Attention Guided One-Stage Scene Text Spotter
Liang Qiao
Ying-Cong Chen
Zhanzhan Cheng
Yunlu Xu
Yi Niu
Shiliang Pu
Leilei Gan
51
77
0
08 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
530
40,739
0
22 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
191
5,046
0
08 Oct 2020
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
  Spotting
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
Minghui Liao
Guan Pang
Jing Huang
Tal Hassner
X. Bai
49
183
0
18 Jul 2020
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text
  Recognition
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition
Zhi Qiao
Yu Zhou
Dongbao Yang
Yucan Zhou
Weiping Wang
56
227
0
22 May 2020
On Vocabulary Reliance in Scene Text Recognition
On Vocabulary Reliance in Scene Text Recognition
Zhaoyi Wan
Jielei Zhang
Liang Zhang
Jiebo Luo
Cong Yao
52
57
0
08 May 2020
Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
Deli Yu
Xuan Li
Chengquan Zhang
Junyu Han
Jingtuo Liu
Errui Ding
80
286
0
27 Mar 2020
SCATTER: Selective Context Attentional Scene Text Recognizer
SCATTER: Selective Context Attentional Scene Text Recognizer
Ron Litman
Oron Anschel
Shahar Tsiper
R. Litman
Shai Mazor
R. Manmatha
54
135
0
25 Mar 2020
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard
Xi Liu
Rui Zhang
Yongsheng Zhou
Qianyi Jiang
Qi Song
...
X. Bai
Baoguang Shi
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
3DV
50
157
0
20 Dec 2019
MMTM: Multimodal Transfer Module for CNN Fusion
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
75
277
0
20 Nov 2019
Chinese Street View Text: Large-scale Chinese Text Reading with
  Partially Supervised Learning
Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning
Yipeng Sun
Jiaming Liu
Wei Liu
Junyu Han
Errui Ding
Jingtuo Liu
66
52
0
17 Sep 2019
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
Chee-Kheng Chng
Yuliang Liu
Yipeng Sun
Chun Chet Ng
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
76
214
0
16 Sep 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
130
1,948
0
09 Aug 2019
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection
  and Recognition -- RRC-MLT-2019
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019
Nibal Nayef
Yash J. Patel
M. Busta
Pinaki Nath Chowdhury
Dimosthenis Karatzas
...
Jirí Matas
Umapada Pal
J. Burie
Cheng-Lin Liu
J. Ogier
3DV
61
247
0
01 Jul 2019
12
Next