Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.04138
Cited By
v1
v2 (latest)
The 3D-PC: a benchmark for visual perspective taking in humans and machines
6 June 2024
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The 3D-PC: a benchmark for visual perspective taking in humans and machines"
50 / 79 papers shown
Title
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
Gracjan Góral
Alicja Ziarko
Piotr Miłoś
Michał Nauman
Maciej Wołczyk
Michał Kosiński
LRM
64
0
0
03 May 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas Guibas
Minhyuk Sung
LRM
102
3
0
24 Apr 2025
The Philosophical Foundations of Growing AI Like A Child
Dezhi Luo
Yijiang Li
Hokin Deng
ReLM
LRM
95
2
0
15 Feb 2025
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen
Xingyu Chen
Anpei Chen
Gerard Pons-Moll
Yuliang Xiu
3DGS
127
5
0
12 Dec 2024
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang
Roni Paiss
Shiran Zada
Nikhil Karnad
David E. Jacobs
Yael Pritch
Inbar Mosseri
Mike Zheng Shou
Neal Wadhwa
Nataniel Ruiz
DiffM
VGen
152
21
0
07 Nov 2024
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
Gracjan Góral
Alicja Ziarko
Michal Nauman
Maciej Wołczyk
LRM
74
2
0
02 Sep 2024
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani
Amit Raj
Kevis-Kokitsi Maninis
Abhishek Kar
Yuanzhen Li
Michael Rubinstein
Deqing Sun
Leonidas Guibas
Justin Johnson
Varun Jampani
87
86
0
12 Apr 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
245
819
0
19 Jan 2024
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
Saurabh Saxena
Junhwa Hur
Charles Herrmann
Deqing Sun
David J. Fleet
DiffM
90
29
0
20 Dec 2023
Neither hype nor gloom do DNNs justice
Gaurav Malhotra
Christian Tsvetkov
B. D. Evans
74
123
0
08 Dec 2023
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Bernhard Kerbl
Georgios Kopanas
Thomas Leimkuehler
G. Drettakis
3DGS
253
3,802
0
08 Aug 2023
Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Yida Chen
Fernanda Viégas
Martin Wattenberg
DiffM
58
24
0
09 Jun 2023
Emergent Correspondence from Image Diffusion
Luming Tang
Menglin Jia
Qianqian Wang
Cheng Perng Phoo
Bharath Hariharan
107
268
0
06 Jun 2023
Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex
Drew Linsley
I. F. Rodriguez
Thomas Fel
Michael Arcaro
Saloni Sharma
Margaret Livingstone
Thomas Serre
67
20
0
06 Jun 2023
Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception
Drew Linsley
Pinyuan Feng
Thibaut Boissin
A. Ashok
Thomas Fel
Stephanie Olaiya
Thomas Serre
AAML
62
6
0
05 Jun 2023
StyleGAN knows Normal, Depth, Albedo, and More
Anand Bhattad
Daniel McKee
Derek Hoiem
David A. Forsyth
GAN
67
35
0
01 Jun 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
394
3,514
0
14 Apr 2023
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
395
7,405
0
05 Apr 2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li
Mihir Prabhudesai
Shivam Duggal
Ellis L Brown
Deepak Pathak
DiffM
VLM
155
239
0
28 Mar 2023
Zero-1-to-3: Zero-shot One Image to 3D Object
Ruoshi Liu
Rundi Wu
Basile Van Hoorick
P. Tokmakov
Sergey Zakharov
Carl Vondrick
DiffM
150
1,109
0
20 Mar 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,761
0
15 Mar 2023
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
98
169
0
15 Dec 2022
Harmonizing the object recognition strategies of deep neural networks with humans
Thomas Fel
Ivan Felipe
Drew Linsley
Thomas Serre
88
78
0
08 Nov 2022
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
88
168
0
24 Oct 2022
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
Muhammad Maaz
Abdelrahman M. Shaker
Hisham Cholakkal
Salman Khan
Syed Waqas Zamir
Rao Muhammad Anwer
Fahad Shahbaz Khan
ViT
125
200
0
21 Jun 2022
Zero-Shot Category-Level Object Pose Estimation
Walter Goodwin
S. Vaze
Ioannis Havoutis
Ingmar Posner
ViT
71
58
0
07 Apr 2022
MaxViT: Multi-Axis Vision Transformer
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
138
670
0
04 Apr 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
511
15,788
0
20 Dec 2021
Deep ViT Features as Dense Visual Descriptors
Shirzad Amir
Yossi Gandelsman
Shai Bagon
Tali Dekel
MDE
ViT
134
290
0
10 Dec 2021
MetaFormer Is Actually What You Need for Vision
Weihao Yu
Mi Luo
Pan Zhou
Chenyang Si
Yichen Zhou
Xinchao Wang
Jiashi Feng
Shuicheng Yan
173
918
0
22 Nov 2021
iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
88
743
0
15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
480
7,837
0
11 Nov 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
288
1,285
0
05 Oct 2021
PP-LCNet: A Lightweight CPU Convolutional Neural Network
Cheng Cui
Tingquan Gao
Shengyun Wei
Yuning Du
Ruoyu Guo
...
X. Lv
Qiwen Liu
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
ObjD
107
127
0
17 Sep 2021
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
Jeremy Reizenstein
Roman Shapovalov
Philipp Henzler
L. Sbordone
Patrick Labatut
David Novotny
3DPC
3DV
152
470
0
01 Sep 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
122
1,682
0
25 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
118
327
0
24 Jun 2021
XCiT: Cross-Covariance Image Transformers
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
ViT
151
513
0
17 Jun 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
297
2,845
0
15 Jun 2021
Partial success in closing the gap between human and machine vision
Robert Geirhos
Kantharaju Narayanappa
Benjamin Mitzkus
Tizian Thieringer
Matthias Bethge
Felix Wichmann
Wieland Brendel
VLM
AAML
88
231
0
14 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
141
1,212
0
09 Jun 2021
Tracking Without Re-recognition in Humans and Machines
Drew Linsley
Girik Malik
Junkyung Kim
L. Govindarajan
E. Mingolla
Thomas Serre
52
18
0
27 May 2021
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Xiangxiang Chu
Zhi Tian
Yuqing Wang
Bo Zhang
Haibing Ren
Xiaolin K. Wei
Huaxia Xia
Chunhua Shen
ViT
87
1,028
0
28 Apr 2021
Visformer: The Vision-friendly Transformer
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
189
223
0
26 Apr 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
324
711
0
22 Apr 2021
Co-Scale Conv-Attentional Image Transformers
Weijian Xu
Yifan Xu
Tyler A. Chang
Zhuowen Tu
ViT
59
377
0
13 Apr 2021
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Ben Graham
Alaaeldin El-Nouby
Hugo Touvron
Pierre Stock
Armand Joulin
Hervé Jégou
Matthijs Douze
ViT
95
794
0
02 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
168
1,021
0
31 Mar 2021
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
522
582
0
30 Mar 2021
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chun-Fu Chen
Quanfu Fan
Yikang Shen
ViT
71
1,486
0
27 Mar 2021
1
2
Next