Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.20728
Cited By
v1
v2 (latest)
Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models
27 May 2025
Zesen Lyu
Dandan Zhang
Wei Ye
Fangdi Li
Zhihang Jiang
Yao Yang
ReLM
VLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models"
9 / 9 papers shown
Title
Aya Vision: Advancing the Frontier of Multilingual Multimodality
Saurabh Dash
Yiyang Nan
John Dang
Arash Ahmadian
Shivalika Singh
...
Sudip Roy
Matthias Gallé
Beyza Ermis
Ahmet Üstün
Sara Hooker
VLM
62
7
0
13 May 2025
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
Atin Pothiraj
Elias Stengel-Eskin
Jaemin Cho
Joey Tianyi Zhou
106
3
0
21 Apr 2025
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Kexian Tang
Junyao Gao
Yanhong Zeng
Haodong Duan
Yanan Sun
Zhening Xing
Wenran Liu
Kaifeng Lyu
Kai-xiang Chen
ELM
LRM
138
9
0
25 Mar 2025
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
274
960
0
27 Nov 2023
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
170
78
0
25 May 2022
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
243
1,444
0
03 Nov 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.0K
29,926
0
26 Feb 2021
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
121
1,255
0
18 Apr 2019
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
437
43,875
0
01 May 2014
1