ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.05486
  4. Cited By
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped
  scene text

TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

12 May 2021
Amanpreet Singh
Guan Pang
Mandy Toh
Jing Huang
Wojciech Galuba
Tal Hassner
ArXiv (abs)PDFHTML

Papers citing "TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text"

49 / 49 papers shown
Title
A Token-level Text Image Foundation Model for Document Understanding
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
152
2
0
04 Mar 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
Tanveer Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLMAI4TS
182
2
0
14 Feb 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
311
59
0
03 Jan 2025
Instruction-Guided Scene Text Recognition
Instruction-Guided Scene Text Recognition
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
188
3
0
03 Jan 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLMVLM
211
2
0
20 Dec 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLMMLLM
149
34
0
10 Oct 2024
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting
Alloy Das
Sanket Biswas
Umapada Pal
Josep Lladós
Saumik Bhattacharya
117
3
0
27 Aug 2024
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu
Zhengyao Fang
Pengyuan Lyu
Chengquan Zhang
Fanglin Chen
Guangming Lu
Wenjie Pei
131
3
0
28 Jul 2024
Out of Length Text Recognition with Sub-String Matching
Out of Length Text Recognition with Sub-String Matching
Yongkun Du
Zhineng Chen
Caiyan Jia
Xieping Gao
Yu-Gang Jiang
215
2
0
17 Jul 2024
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
Mingxin Huang
Dezhi Peng
Hongliang Li
Zhenghao Peng
Chongyu Liu
Dahua Lin
Yuliang Liu
Xiang Bai
Lianwen Jin
132
1
0
15 Jan 2024
A Multiplexed Network for End-to-End, Multilingual OCR
A Multiplexed Network for End-to-End, Multilingual OCR
Jing Huang
Guan Pang
Rama Kovvuri
Mandy Toh
Kevin J. Liang
Praveen Krishnan
Xi Yin
Tal Hassner
55
33
0
29 Mar 2021
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question
  Answering
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zanxia Jin
Heran Wu
Chun Yang
Fang Zhou
Jingyan Qin
Lei Xiao
Xu-Cheng Yin
77
31
0
24 Oct 2020
Finding the Evidence: Localization-aware Answer Prediction for Text
  Visual Question Answering
Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering
Wei Han
Hantao Huang
Tao Han
50
51
0
06 Oct 2020
Spatially Aware Multimodal Transformers for TextVQA
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
Alex Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
82
86
0
23 Jul 2020
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
  Spotting
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
Minghui Liao
Guan Pang
Jing Huang
Tal Hassner
X. Bai
84
184
0
18 Jul 2020
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
146
743
0
01 Jul 2020
Structured Multimodal Attentions for TextVQA
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton Van Den Hengel
Qi Wu
76
59
0
01 Jun 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
103
608
0
10 May 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
TextCaps: a Dataset for Image Captioning with Reading Comprehension
Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
90
418
0
24 Mar 2020
On the General Value of Evidence, and Bilingual Scene-Text Visual
  Question Answering
On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
Xinyu Wang
Yuliang Liu
Chunhua Shen
Chun Chet Ng
Canjie Luo
Lianwen Jin
C. Chan
Anton Van Den Hengel
Liangwei Wang
96
97
0
24 Feb 2020
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Yuliang Liu
Hao Chen
Chunhua Shen
Tong He
Lianwen Jin
Liangwei Wang
99
334
0
24 Feb 2020
Captioning Images Taken by People Who Are Blind
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
77
183
0
20 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
547
42,639
0
03 Dec 2019
All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting
All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting
Hao Wang
Pu Lu
Hui Zhang
Mingkun Yang
X. Bai
Yongchao Xu
Mengchao He
Yongpan Wang
Wenyu Liu
92
129
0
21 Nov 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal
  Transformers for TextVQA
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
71
197
0
14 Nov 2019
Rosetta: Large scale system for text detection and recognition in images
Rosetta: Large scale system for text detection and recognition in images
Fedor Borisyuk
Albert Gordo
V. Sivakumar
78
300
0
11 Oct 2019
ICDAR 2019 Competition on Large-scale Street View Text with Partial
  Labeling -- RRC-LSVT
ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVT
Yipeng Sun
Zihan Ni
Chee-Kheng Chng
Yuliang Liu
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
100
158
0
17 Sep 2019
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
Chee-Kheng Chng
Yuliang Liu
Yipeng Sun
Chun Chet Ng
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
95
215
0
16 Sep 2019
PlotQA: Reasoning over Scientific Plots
PlotQA: Reasoning over Scientific Plots
Nitesh Methani
Pritha Ganguly
Mitesh M. Khapra
Pratyush Kumar
109
236
0
03 Sep 2019
Towards Unconstrained End-to-End Text Spotting
Towards Unconstrained End-to-End Text Spotting
Siyang Qin
Alessandro Bissacco
Michalis Raptis
Yasuhisa Fujii
Y. Xiao
58
130
0
24 Aug 2019
LEAF-QA: Locate, Encode & Attend for Figure Question Answering
LEAF-QA: Locate, Encode & Attend for Figure Question Answering
Ritwick Chaudhry
Sumit Shekhar
Utkarsh Gupta
Pranav Maneriker
Prann Bansal
Ajay Joshi
LMTD
47
89
0
30 Jul 2019
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection
  and Recognition -- RRC-MLT-2019
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019
Nibal Nayef
Yash J. Patel
M. Busta
Pinaki Nath Chowdhury
Dimosthenis Karatzas
...
Jirí Matas
Umapada Pal
J. Burie
Cheng-Lin Liu
J. Ogier
3DV
78
251
0
01 Jul 2019
Scene Text Visual Question Answering
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
111
360
0
31 May 2019
Towards VQA Models That Can Read
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
111
1,255
0
18 Apr 2019
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and
  Model Analysis
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Jeonghun Baek
Geewook Kim
Junyeop Lee
Sungrae Park
Dongyoon Han
Sangdoo Yun
Seong Joon Oh
Hwalsuk Lee
450
478
0
03 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,229
0
11 Oct 2018
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting
  Text with Arbitrary Shapes
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
Pengyuan Lyu
Minghui Liao
Cong Yao
Wenhao Wu
X. Bai
92
599
0
06 Jul 2018
Detecting Curve Text in the Wild: New Dataset and New Solution
Detecting Curve Text in the Wild: New Dataset and New Solution
Liu Yuliang
Jin Lianwen
Shuaitao Zhang
Sheng Zhang
80
254
0
06 Dec 2017
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
Baoguang Shi
Cong Yao
Minghui Liao
Mingkun Yang
Pei Xu
Linyan Cui
Serge J. Belongie
Shijian Lu
X. Bai
53
215
0
31 Aug 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
795
132,454
0
12 Jun 2017
FastText.zip: Compressing text classification models
FastText.zip: Compressing text classification models
Armand Joulin
Edouard Grave
Piotr Bojanowski
Matthijs Douze
Hervé Jégou
Tomas Mikolov
MQ
91
1,216
0
12 Dec 2016
Synthetic Data for Text Localisation in Natural Images
Synthetic Data for Text Localisation in Natural Images
Ankush Gupta
Andrea Vedaldi
Andrew Zisserman
153
1,430
0
22 Apr 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
225
5,765
0
23 Feb 2016
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in
  Natural Images
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
281
530
0
26 Jan 2016
An End-to-End Trainable Neural Network for Image-based Sequence
  Recognition and Its Application to Scene Text Recognition
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Baoguang Shi
X. Bai
Cong Yao
VLM
215
2,490
0
21 Jul 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
531
62,409
0
04 Jun 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,364
0
22 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
300
4,511
0
20 Nov 2014
Synthetic Data and Artificial Neural Networks for Natural Scene Text
  Recognition
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
Max Jaderberg
Karen Simonyan
Andrea Vedaldi
Andrew Zisserman
157
935
0
09 Jun 2014
1