Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1601.07140
Cited By
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
26 January 2016
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images"
50 / 76 papers shown
Title
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
104
2
0
20 Dec 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
62
25
0
10 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
82
25
0
04 Oct 2024
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Zixiao Wang
Hongtao Xie
Yuxin Wang
Yadong Qu
Fengjun Guo
Pengwei Liu
DiffM
31
0
0
20 Sep 2024
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer
Humen Zhong
Zhibo Yang
Zhaohai Li
Peng Wang
Jun Tang
Wenqing Cheng
Cong Yao
21
1
0
18 Sep 2024
WAS: Dataset and Methods for Artistic Text Segmentation
Xudong Xie
Yuzhe Li
Yang Liu
Zhifei Zhang
Zhaowen Wang
Wei Xiong
Xiang Bai
DiffM
50
2
0
31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
40
7
0
31 Jul 2024
Out of Length Text Recognition with Sub-String Matching
Yongkun Du
Zhineng Chen
Caiyan Jia
Xieping Gao
Yu-Gang Jiang
49
2
0
17 Jul 2024
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
Bangbang Zhou
Yadong Qu
Zixiao Wang
Zicheng Li
Boqiang Zhang
Hongtao Xie
37
1
0
08 Jul 2024
Classification of Non-native Handwritten Characters Using Convolutional Neural Network
F. A. Mamun
S. Chowdhury
J. E. Giti
H. Sarker
37
1
0
06 Jun 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
62
7
0
27 May 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
61
33
0
29 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
48
3
0
07 Mar 2024
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
Yan Shu
Weichao Zeng
Zhenhang Li
Fangmin Zhao
Yu Zhou
30
3
0
05 Feb 2024
GloTSFormer: Global Video Text Spotting Transformer
Hang Wang
Yanjie Wang
Yang Li
Can Huang
29
0
0
08 Jan 2024
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
Xiaomeng Yang
Zhi Qiao
Yu Zhou
DiffM
57
1
0
19 Dec 2023
Scene Text Image Super-resolution based on Text-conditional Diffusion Models
Chihiro Noguchi
Shun Fukuda
Masao Yamanaka
DiffM
25
10
0
16 Nov 2023
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng-Tao Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
35
116
0
07 Sep 2023
DTrOCR: Decoder-only Transformer for Optical Character Recognition
Masato Fujitake
41
35
0
30 Aug 2023
Adaptive Segmentation Network for Scene Text Detection
Gui-yan Zhao
SSeg
22
1
0
27 Jul 2023
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision
Yukun Zhai
Xiaoqiang Zhang
Xiameng Qin
Sanyuan Zhao
Xingping Dong
Jianbing Shen
33
4
0
06 Jun 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
16
6
0
04 Apr 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
21
110
0
21 Dec 2022
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
18
4
0
16 Dec 2022
CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection
Xi Zhao
Wei Feng
Zheng Zhang
Jing Lv
Xin Zhu
Zhangang Lin
Jin Hu
Jingping Shao
32
5
0
05 Dec 2022
Exploring Stroke-Level Modifications for Scene Text Editing
Yadong Qu
Qingfeng Tan
Hongtao Xie
Jianjun Xu
Yuxin Wang
Yongdong Zhang
DiffM
26
33
0
05 Dec 2022
Domain Adaptive Scene Text Detection via Subcategorization
Zichen Tian
Chuhui Xue
Jingyi Zhang
Shijian Lu
24
3
0
01 Dec 2022
Impact of Automatic Image Classification and Blind Deconvolution in Improving Text Detection Performance of the CRAFT Algorithm
Clarisa V. Albarillo
P. Fernandez
18
1
0
29 Nov 2022
Rooms with Text: A Dataset for Overlaying Text Detection
Oleg Smirnov
Aditya Tewari
13
0
0
21 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjD
OOD
22
0
0
01 Nov 2022
Out-of-Vocabulary Challenge Report
Sergi Garcia-Bordils
Andrés Mafla
Ali Furkan Biten
Oren Nuriel
Aviad Aberdam
Shai Mazor
Ron Litman
Dimosthenis Karatzas
9
16
0
14 Sep 2022
1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words
Zhangzi Zhu
Chuhui Xue
Yu Hao
Wenqing Zhang
Song Bai
48
0
0
01 Sep 2022
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition
Xudong Xie
Ling Fu
Zhifei Zhang
Zhaowen Wang
X. Bai
ViT
32
45
0
31 Jul 2022
Real-time End-to-End Video Text Spotter with Contrastive Representation Learning
Wejia Wu
Zhuang Li
Jiahong Li
Chunhua Shen
Hong Zhou
Size Li
Zhongyuan Wang
Ping Luo
AI4TS
21
8
0
18 Jul 2022
COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts
Jeonghun Baek
Yusuke Matsui
Kiyoharu Aizawa
34
13
0
11 Jul 2022
Explore Faster Localization Learning For Scene Text Detection
Yuzhong Zhao
Yuanqiang Cai
Weijia Wu
Weiqiang Wang
ViT
26
14
0
04 Jul 2022
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang
Minghui Liao
Pu Lu
Jing Wang
Shenggao Zhu
Hualin Luo
Qingzhen Tian
X. Bai
SSL
29
55
0
01 Jul 2022
An Evaluation of OCR on Egocentric Data
Valentin Popescu
Dima Damen
Toby Perrett
EgoV
25
0
0
11 Jun 2022
Detection Masking for Improved OCR on Noisy Documents
Daniel Rotman
Ophir Azulai
Inbar Shapira
Yevgeny Burshtein
Udi Barzelay
30
4
0
17 May 2022
Multimodal Semi-Supervised Learning for Text Recognition
Aviad Aberdam
Roy Ganz
Shai Mazor
Ron Litman
VLM
22
19
0
08 May 2022
End-to-End Video Text Spotting with Transformer
Weijia Wu
Yuanqiang Cai
Chunhua Shen
Debing Zhang
Ying Fu
Hong Zhou
Ping Luo
ViT
43
24
0
20 Mar 2022
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Yair Kittenplon
I. Lavi
Sharon Fogel
Yarin Bar
R. Manmatha
Pietro Perona
ViT
11
53
0
11 Feb 2022
Towards Boosting the Accuracy of Non-Latin Scene Text Recognition
Sanjana Gunna
Rohit Saluja
C. V. Jawahar
19
5
0
10 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
24
100
0
23 Dec 2021
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer
Weijia Wu
Yuanqiang Cai
Debing Zhang
Sibo Wang
Zhuang Li
Jiahong Li
Yejun Tang
Hong Zhou
25
29
0
09 Dec 2021
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Pérez Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
30
23
0
10 Nov 2021
Demystifying the Transferability of Adversarial Attacks in Computer Networks
Ehsan Nowroozi
Yassine Mekdad
Mohammad Hajian Berenjestanaki
Mauro Conti
Abdeslam El Fergougui
AAML
27
32
0
09 Oct 2021
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
Jue Wang
Haofan Wang
Jincan Deng
Weijia Wu
Debing Zhang
VLM
CLIP
59
18
0
10 Sep 2021
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Xiaopeng Lu
Zhenhua Fan
Yansen Wang
Jean Oh
Carolyn Rose
16
27
0
20 Aug 2021
1
2
Next