Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.06543
Cited By
MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
14 August 2021
Zhanghui Kuang
Hongbin Sun
Zhizhong Li
Xiaoyu Yue
T. Lin
Jianyong Chen
Huaqiang Wei
Yiqin Zhu
Tong Gao
Wenwei Zhang
Kai-xiang Chen
Wayne Zhang
Dahua Lin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding"
36 / 36 papers shown
Title
TrackID3x3: A Dataset and Algorithm for Multi-Player Tracking with Identification and Pose Estimation in 3x3 Basketball Full-court Videos
Kazuhiro Yamada
Li Yin
Qingrui Hu
Ning Ding
Shunsuke Iwashita
Jun Ichikawa
Kiwamu Kotani
Calvin Yeung
Keisuke Fujii
50
0
0
24 Mar 2025
ExpertRAG: Efficient RAG with Mixture of Experts -- Optimizing Context Retrieval for Adaptive LLM Responses
Esmail Gumaan
MoE
32
0
0
23 Mar 2025
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
Hairu Wang
Kai Hu
Liangcai Gao
158
0
0
20 Mar 2025
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
Rohit Saxena
Pasquale Minervini
Frank Keller
VLM
64
0
0
24 Feb 2025
VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data
James E. Gallagher
E. Oughton
27
0
0
24 Dec 2024
Towards Low-Resource Harmful Meme Detection with LMM Agents
Jianzhao Huang
Hongzhan Lin
Ziyan Liu
Ziyang Luo
Guang Chen
Jing Ma
33
2
0
08 Nov 2024
AI-Powered Augmented Reality for Satellite Assembly, Integration and Test
Alvaro Patricio
Joao Valente
Atabak Dehban
Ines Cadilha
Daniel Reis
Rodrigo Ventura
27
1
0
26 Sep 2024
Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?
Pritam Sil
Parag Chaudhuri
Bhaskaran Raman
18
1
0
23 Aug 2024
SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection
Xingjian Hu
Baole Wei
Liangcai Gao
Jun Wang
38
0
0
17 Jun 2024
The First Swahili Language Scene Text Detection and Recognition Dataset
Fadila Wendigoundi Douamba
Jianjun Song
Ling Fu
Yuliang Liu
Xiang Bai
16
0
0
19 May 2024
Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library
Solène Tarride
Yoann Schneider
Marie Generali-Lince
Mélodie Boillet
Bastien Abadie
Christopher Kermorvant
28
3
0
29 Apr 2024
SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap
Vladimir Somers
Victor Joos
A. Cioppa
Silvio Giancola
Seyed Abolfazl Ghasemzadeh
...
S. Kasaei
Guohao Li
Alexandre Alahi
Marc Van Droogenbroeck
Christophe De Vleeschouwer
34
23
0
17 Apr 2024
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang
Haoxuan You
Philipp Dufter
Bowen Zhang
Chen Chen
...
Tsu-jui Fu
William Yang Wang
Shih-Fu Chang
Zhe Gan
Yinfei Yang
ObjD
MLLM
104
44
0
11 Apr 2024
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model
Jiahao Lyu
Jin Wei
Gangyan Zeng
Zeng Li
Enze Xie
Wei Wang
Yu Zhou
VLM
29
3
0
15 Mar 2024
Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models
Hongzhan Lin
Ziyang Luo
Wei Gao
Jing Ma
Bo Wang
Ruichao Yang
34
13
0
24 Jan 2024
Progressive Evolution from Single-Point to Polygon for Scene Text
Linger Deng
Mingxin Huang
Xudong Xie
Yuliang Liu
Lianwen Jin
Xiang Bai
34
1
0
21 Dec 2023
Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models
Hongzhan Lin
Ziyang Luo
Jing Ma
Long Chen
27
9
0
09 Dec 2023
Rethinking Detection Based Table Structure Recognition for Visually Rich Document Images
Bin Xiao
Murat Simsek
B. Kantarci
Ala Abu Alkheir
LMTD
29
1
0
01 Dec 2023
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Haoxu Wang
Fan Yu
Xian Shi
Yuezhang Wang
Shiliang Zhang
Ming Li
29
11
0
11 Sep 2023
Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach
Ziyin Zhang
Ning Lu
Minghui Liao
Yongshuai Huang
Cheng Li
Min Wang
Wei Peng
28
11
0
17 Aug 2023
Adaptive Segmentation Network for Scene Text Detection
Gui-yan Zhao
SSeg
27
1
0
27 Jul 2023
Context Perception Parallel Decoder for Scene Text Recognition
Yongkun Du
Zhineng Chen
Caiyan Jia
Xiaoyue Yin
Chenxia Li
Yuning Du
Yu-Gang Jiang
34
7
0
23 Jul 2023
Looking and Listening: Audio Guided Text Recognition
Wenwen Yu
Mingyu Liu
Biao Yang
Enming Zhang
Deqiang Jiang
Xing Sun
Yuliang Liu
Xiang Bai
DiffM
25
1
0
06 Jun 2023
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Rui Yang
Lin Song
Yanwei Li
Sijie Zhao
Yixiao Ge
Xiu Li
Ying Shan
SyDa
MLLM
28
209
0
30 May 2023
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Filip Radenovic
Abhimanyu Dubey
Abhishek Kadian
Todor Mihaylov
Simon Vandenhende
Yash J. Patel
Y. Wen
Vignesh Ramanathan
D. Mahajan
VLM
32
81
0
05 Jan 2023
A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition
Gurkan Soykan
Deniz Yuret
T. M. Sezgin
25
3
0
27 Dec 2022
Text Detection Forgot About Document OCR
Krzysztof Olejniczak
Milan Šulc
34
9
0
14 Oct 2022
Reading Chinese in Natural Scenes with a Bag-of-Radicals Prior
Yongbin Liu
Liu Qingjie
Jiaxin Chen
Wang Yunhong
34
1
0
05 Oct 2022
Vision-Language Adaptive Mutual Decoder for OOV-STR
Jinshui Hu
Chenyu Liu
Qiandong Yan
Xuyang Zhu
Jiajia Wu
Feng Yu
Bing Yin
VLM
24
0
0
02 Sep 2022
Handling big tabular data of ICT supply chains: a multi-task, machine-interpretable approach
Bin Xiao
Murat Simsek
B. Kantarci
Ala Abu Alkheir
LMTD
14
4
0
11 Aug 2022
Explore Faster Localization Learning For Scene Text Detection
Yuzhong Zhao
Yuanqiang Cai
Weijia Wu
Weiqiang Wang
ViT
34
14
0
04 Jul 2022
Unitail: Detecting, Reading, and Matching in Retail Scene
Fangyi Chen
Han Zhang
Zaiwang Li
Jiachen Dou
Shentong Mo
Hao Chen
Yongxin Zhang
Uzair Ahmed
Chenchen Zhu
Marios Savvides
27
9
0
01 Apr 2022
TPSNet: Reverse Thinking of Thin Plate Splines for Arbitrary Shape Scene Text Representation
Wei Wang
Yu Zhou
Jiahao Lv
Dayan Wu
Guoqing Zhao
Ning Jiang
Weiping Wang
46
33
0
25 Oct 2021
On Exploring and Improving Robustness of Scene Text Detection Models
Shilian Wu
Wei Zhai
Yongrui Li
Kewei Wang
Zengfu Wang
26
1
0
12 Oct 2021
Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes
Yuanduo Hong
Huihui Pan
Weichao Sun
Yisong Jia
SSeg
138
260
0
15 Jan 2021
Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
Shi-Xue Zhang
Xiaobin Zhu
Jie-Bo Hou
Chang-rui Liu
Chun Yang
Hongfa Wang
Xu-Cheng Yin
GNN
79
182
0
17 Mar 2020
1