Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17726
Cited By
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
23 May 2025
Donghwan Chi
Hyomin Kim
Yoonjin Oh
Yongjin Kim
Donghoon Lee
DaeJin Jo
Jongmin Kim
Junyeob Baek
Sungjin Ahn
Sungwoong Kim
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM"
15 / 65 papers shown
Title
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
50
1,174
0
18 Apr 2019
TallyQA: Answering Complex Counting Questions
Manoj Acharya
Kushal Kafle
Christopher Kanan
40
117
0
29 Oct 2018
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRL
SSL
231
10,152
0
10 Jul 2018
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
66
831
0
22 Feb 2018
DVQA: Understanding Data Visualizations via Question Answering
Kushal Kafle
Brian L. Price
Scott D. Cohen
Christopher Kanan
AIMat
49
379
0
24 Jan 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
297
11,610
0
11 Jan 2018
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
161
4,928
0
02 Nov 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
275
2,346
0
20 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
297
3,187
0
02 Dec 2016
Visual Storytelling
Ting-Hao 'Kenneth' Huang
Huang
Francis Ferraro
N. Mostafazadeh
Ishan Misra
...
C. L. Zitnick
Devi Parikh
Lucy Vanderwende
Michel Galley
Margaret Mitchell
VGen
45
470
0
13 Apr 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
170
5,706
0
23 Feb 2016
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.2K
76,547
0
18 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
153
2,461
0
01 Apr 2015
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
227
4,451
0
20 Nov 2014
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
647
23,235
0
03 Jun 2014
Previous
1
2