ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1601.07140
  4. Cited By
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in
  Natural Images

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

26 January 2016
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
ArXivPDFHTML

Papers citing "COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images"

50 / 76 papers shown
Title
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
104
2
0
20 Dec 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
62
25
0
10 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
82
25
0
04 Oct 2024
Leveraging Text Localization for Scene Text Removal via Text-aware
  Masked Image Modeling
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Zixiao Wang
Hongtao Xie
Yuxin Wang
Yadong Qu
Fengjun Guo
Pengwei Liu
DiffM
31
0
0
20 Sep 2024
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text
  Recognizer
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer
Humen Zhong
Zhibo Yang
Zhaohai Li
Peng Wang
Jun Tang
Wenqing Cheng
Cong Yao
21
1
0
18 Sep 2024
WAS: Dataset and Methods for Artistic Text Segmentation
WAS: Dataset and Methods for Artistic Text Segmentation
Xudong Xie
Yuzhe Li
Yang Liu
Zhifei Zhang
Zhaowen Wang
Wei Xiong
Xiang Bai
DiffM
50
2
0
31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large
  Language Models
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
40
7
0
31 Jul 2024
Out of Length Text Recognition with Sub-String Matching
Out of Length Text Recognition with Sub-String Matching
Yongkun Du
Zhineng Chen
Caiyan Jia
Xieping Gao
Yu-Gang Jiang
49
2
0
17 Jul 2024
Focus on the Whole Character: Discriminative Character Modeling for
  Scene Text Recognition
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
Bangbang Zhou
Yadong Qu
Zixiao Wang
Zicheng Li
Boqiang Zhang
Hongtao Xie
37
1
0
08 Jul 2024
Classification of Non-native Handwritten Characters Using Convolutional
  Neural Network
Classification of Non-native Handwritten Characters Using Convolutional Neural Network
F. A. Mamun
S. Chowdhury
J. E. Giti
H. Sarker
37
1
0
06 Jun 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
62
7
0
27 May 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
61
33
0
29 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
48
3
0
07 Mar 2024
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual
  Text Processing
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
Yan Shu
Weichao Zeng
Zhenhang Li
Fangmin Zhao
Yu Zhou
30
3
0
05 Feb 2024
GloTSFormer: Global Video Text Spotting Transformer
GloTSFormer: Global Video Text Spotting Transformer
Hang Wang
Yanjie Wang
Yang Li
Can Huang
29
0
0
08 Jan 2024
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
Xiaomeng Yang
Zhi Qiao
Yu Zhou
DiffM
57
1
0
19 Dec 2023
Scene Text Image Super-resolution based on Text-conditional Diffusion
  Models
Scene Text Image Super-resolution based on Text-conditional Diffusion Models
Chihiro Noguchi
Shun Fukuda
Masao Yamanaka
DiffM
25
10
0
16 Nov 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng-Tao Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
35
116
0
07 Sep 2023
DTrOCR: Decoder-only Transformer for Optical Character Recognition
DTrOCR: Decoder-only Transformer for Optical Character Recognition
Masato Fujitake
41
35
0
30 Aug 2023
Adaptive Segmentation Network for Scene Text Detection
Adaptive Segmentation Network for Scene Text Detection
Gui-yan Zhao
SSeg
22
1
0
27 Jul 2023
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
Yukun Zhai
Xiaoqiang Zhang
Xiameng Qin
Sanyuan Zhao
Xingping Dong
Jianbing Shen
33
4
0
06 Jun 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
16
6
0
04 Apr 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
21
110
0
21 Dec 2022
SceneGATE: Scene-Graph based co-Attention networks for TExt visual
  question answering
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
18
4
0
16 Dec 2022
CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text
  Detection
CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection
Xi Zhao
Wei Feng
Zheng Zhang
Jing Lv
Xin Zhu
Zhangang Lin
Jin Hu
Jingping Shao
32
5
0
05 Dec 2022
Exploring Stroke-Level Modifications for Scene Text Editing
Exploring Stroke-Level Modifications for Scene Text Editing
Yadong Qu
Qingfeng Tan
Hongtao Xie
Jianjun Xu
Yuxin Wang
Yongdong Zhang
DiffM
26
33
0
05 Dec 2022
Domain Adaptive Scene Text Detection via Subcategorization
Domain Adaptive Scene Text Detection via Subcategorization
Zichen Tian
Chuhui Xue
Jingyi Zhang
Shijian Lu
24
3
0
01 Dec 2022
Impact of Automatic Image Classification and Blind Deconvolution in
  Improving Text Detection Performance of the CRAFT Algorithm
Impact of Automatic Image Classification and Blind Deconvolution in Improving Text Detection Performance of the CRAFT Algorithm
Clarisa V. Albarillo
P. Fernandez
18
1
0
29 Nov 2022
Rooms with Text: A Dataset for Overlaying Text Detection
Rooms with Text: A Dataset for Overlaying Text Detection
Oleg Smirnov
Aditya Tewari
13
0
0
21 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of
  Application
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjD
OOD
22
0
0
01 Nov 2022
Out-of-Vocabulary Challenge Report
Out-of-Vocabulary Challenge Report
Sergi Garcia-Bordils
Andrés Mafla
Ali Furkan Biten
Oren Nuriel
Aviad Aberdam
Shai Mazor
Ron Litman
Dimosthenis Karatzas
9
16
0
14 Sep 2022
1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene
  Text Understanding: End-to-End Recognition of Out of Vocabulary Words
1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words
Zhangzi Zhu
Chuhui Xue
Yu Hao
Wenqing Zhang
Song Bai
48
0
0
01 Sep 2022
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text
  Recognition
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition
Xudong Xie
Ling Fu
Zhifei Zhang
Zhaowen Wang
X. Bai
ViT
32
45
0
31 Jul 2022
Real-time End-to-End Video Text Spotter with Contrastive Representation Learning
Wejia Wu
Zhuang Li
Jiahong Li
Chunhua Shen
Hong Zhou
Size Li
Zhongyuan Wang
Ping Luo
AI4TS
21
8
0
18 Jul 2022
COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated
  Texts
COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts
Jeonghun Baek
Yusuke Matsui
Kiyoharu Aizawa
34
13
0
11 Jul 2022
Explore Faster Localization Learning For Scene Text Detection
Explore Faster Localization Learning For Scene Text Detection
Yuzhong Zhao
Yuanqiang Cai
Weijia Wu
Weiqiang Wang
ViT
26
14
0
04 Jul 2022
Reading and Writing: Discriminative and Generative Modeling for
  Self-Supervised Text Recognition
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang
Minghui Liao
Pu Lu
Jing Wang
Shenggao Zhu
Hualin Luo
Qingzhen Tian
X. Bai
SSL
29
55
0
01 Jul 2022
An Evaluation of OCR on Egocentric Data
An Evaluation of OCR on Egocentric Data
Valentin Popescu
Dima Damen
Toby Perrett
EgoV
25
0
0
11 Jun 2022
Detection Masking for Improved OCR on Noisy Documents
Detection Masking for Improved OCR on Noisy Documents
Daniel Rotman
Ophir Azulai
Inbar Shapira
Yevgeny Burshtein
Udi Barzelay
30
4
0
17 May 2022
Multimodal Semi-Supervised Learning for Text Recognition
Multimodal Semi-Supervised Learning for Text Recognition
Aviad Aberdam
Roy Ganz
Shai Mazor
Ron Litman
VLM
22
19
0
08 May 2022
End-to-End Video Text Spotting with Transformer
End-to-End Video Text Spotting with Transformer
Weijia Wu
Yuanqiang Cai
Chunhua Shen
Debing Zhang
Ying Fu
Hong Zhou
Ping Luo
ViT
43
24
0
20 Mar 2022
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Yair Kittenplon
I. Lavi
Sharon Fogel
Yarin Bar
R. Manmatha
Pietro Perona
ViT
11
53
0
11 Feb 2022
Towards Boosting the Accuracy of Non-Latin Scene Text Recognition
Towards Boosting the Accuracy of Non-Latin Scene Text Recognition
Sanjana Gunna
Rohit Saluja
C. V. Jawahar
19
5
0
10 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
24
100
0
23 Dec 2021
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text
  Spotter with Transformer
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer
Weijia Wu
Yuanqiang Cai
Debing Zhang
Sibo Wang
Zhuang Li
Jiahong Li
Yejun Tang
Hong Zhou
25
29
0
09 Dec 2021
ICDAR 2021 Competition on Document VisualQuestion Answering
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Pérez Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
30
23
0
10 Nov 2021
Demystifying the Transferability of Adversarial Attacks in Computer
  Networks
Demystifying the Transferability of Adversarial Attacks in Computer Networks
Ehsan Nowroozi
Yassine Mekdad
Mohammad Hajian Berenjestanaki
Mauro Conti
Abdeslam El Fergougui
AAML
27
32
0
09 Oct 2021
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident
  Learning and Language Modeling
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
Jue Wang
Haofan Wang
Jincan Deng
Weijia Wu
Debing Zhang
VLM
CLIP
59
18
0
10 Sep 2021
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Xiaopeng Lu
Zhenhua Fan
Yansen Wang
Jean Oh
Carolyn Rose
16
27
0
20 Aug 2021
12
Next