Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.12803
Cited By
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
19 April 2024
Jingqun Tang
Chunhui Lin
Zhen Zhao
Shubo Wei
Binghong Wu
Qi Liu
Hao Feng
Yang Li
Siqi Wang
Lei Liao
Wei Shi
Yuliang Liu
Hao Liu
Yuan Xie
Xiang Bai
Can Huang
LRM
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TextSquare: Scaling up Text-Centric Visual Instruction Tuning"
30 / 30 papers shown
Title
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
An-Lan Wang
Jingqun Tang
Liao Lei
H. Feng
Qi Liu
...
Wei Liu
Hao Liu
Yong-Jin Liu
Xiang Bai
Can Huang
9
0
0
16 May 2025
TDRI: Two-Phase Dialogue Refinement and Co-Adaptation for Interactive Image Generation
Yuheng Feng
Jianhui Wang
Kun Li
Sida Li
Tianyu Shi
Haoyue Han
Miao Zhang
Xueqian Wang
DiffM
149
0
0
22 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu-Xi Cheng
VLM
62
1
0
17 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Yuqing Yang
Afshin Dehghan
Peter Grasch
74
3
0
17 Mar 2025
Task-Oriented 6-DoF Grasp Pose Detection in Clutters
An-Lan Wang
Nuo Chen
Kun-Yu Lin
Li Yuan-Ming
Wei-Shi Zheng
59
2
0
24 Feb 2025
Regression and Forecasting of U.S. Stock Returns Based on LSTM
Shicheng Zhou
Zizhou Zhang
Rong Zhang
Yuchen Yin
Chia Hong Chang
Qinyan Shen
AI4TS
47
0
0
03 Feb 2025
Cross-Modal Synergies: Unveiling the Potential of Motion-Aware Fusion Networks in Handling Dynamic and Static ReID Scenarios
Fuxi Ling
Hongye Liu
Guoqiang Huang
Jing Li
Hong Wu
Zhihao Tang
63
0
0
02 Feb 2025
First-place Solution for Streetscape Shop Sign Recognition Competition
Bin Wang
Li Jing
145
0
0
06 Jan 2025
Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Wenhao Sun
Benlei Cui
Xue-mei Dong
Jingqun Tang
Yi Liu
DiffM
119
12
0
17 Dec 2024
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark
Bin Shan
Xiang Fei
Wei Shi
An-Lan Wang
Guozhi Tang
Lei Liao
Jingqun Tang
Xiang Bai
Can Huang
VLM
25
6
0
15 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Zhihan Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
29
5
0
02 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
40
32
1
30 Sep 2024
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao
Jiapeng Wang
Hongliang Li
Chengyu Wang
Jun Huang
Lianwen Jin
38
0
0
27 Aug 2024
Harmonizing Visual Text Comprehension and Generation
Zhen Zhao
Jingqun Tang
Binghong Wu
Chunhui Lin
Shubo Wei
Hao Liu
Xin Tan
Zhizhong Zhang
Can Huang
Yuan Xie
VLM
28
24
0
23 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
57
5
0
11 Jul 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yunhong Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Hairu Wang
Hao Liu
Can Huang
50
18
0
02 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
37
15
0
27 Jun 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Weichao Zhao
Hao Feng
Qi Liu
Jingqun Tang
Shubo Wei
...
Lei Liao
Yongjie Ye
Hao Liu
Houqiang Li
Can Huang
LMTD
28
18
0
03 Jun 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Yanjie Wang
Yuliang Liu
Hao Liu
Xiang Bai
Can Huang
43
22
0
20 May 2024
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
43
24
0
10 Apr 2024
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
51
26
0
28 Mar 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
87
244
0
29 Jan 2024
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
142
321
0
14 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
38
7
0
11 Dec 2023
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
Jinrong Yang
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
MLLM
VLM
66
74
0
11 Dec 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Ji Zhang
Qin Jin
Liang He
Xin Lin
Feiyan Huang
VLM
MLLM
123
84
0
08 Oct 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
31
63
0
20 Sep 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
159
579
0
06 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,244
0
30 Jan 2023
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
163
263
0
07 Oct 2022
1