Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.02643
Cited By
Segment Anything
5 April 2023
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
Laura Gustafson
Tete Xiao
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Segment Anything"
50 / 456 papers shown
Title
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
92
14
0
09 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
74
34
0
07 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
71
6
0
06 Jun 2024
Tiny models from tiny data: Textual and null-text inversion for few-shot distillation
Erik Landolsi
Fredrik Kahl
DiffM
60
1
0
05 Jun 2024
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed El Amine Boudjoghra
Angela Dai
Jean Lahoud
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Fahad Shahbaz Khan
VLM
ISeg
107
6
0
04 Jun 2024
Information Theoretic Text-to-Image Alignment
Chao Wang
Giulio Franzese
A. Finamore
Massimo Gallo
Pietro Michiardi
88
0
0
31 May 2024
Distribution Aligned Semantics Adaption for Lifelong Person Re-Identification
Qizao Wang
Xuelin Qian
Bin Li
Xiangyang Xue
49
1
0
30 May 2024
3D StreetUnveiler with Semantic-aware 2DGS -- a simple baseline
Jingwei Xu
Yikai Wang
Yiqun Zhao
Yanwei Fu
Shenghua Gao
3DGS
64
2
0
28 May 2024
Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
Leon Götz
Marcel Kollovieh
Stephan Günnemann
Leo Schwinn
39
2
0
28 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
71
4
0
28 May 2024
A re-calibration method for object detection with multi-modal alignment bias in autonomous driving
Zhihang Song
Dingyi Yao
RuiBo Ming
Lihui Peng
Jianming Hu
Danya Yao
Yi Zhang
3DPC
59
0
0
27 May 2024
Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables
James Hinns
David Martens
54
2
0
24 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
72
8
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
108
49
0
23 May 2024
PerSense: Personalized Instance Segmentation in Dense Images
Muhammad Ibraheem Siddiqui
Muhammad Umer Sheikh
Hassan Abid
Muhammad Haris Khan
VLM
64
0
0
22 May 2024
LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting
Jia Gong
Shenyu Ji
Lin Geng Foo
Kang Chen
Hossein Rahmani
Jun Liu
3DGS
41
6
0
21 May 2024
To Ask or Not To Ask: Human-in-the-loop Contextual Bandits with Applications in Robot-Assisted Feeding
Rohan Banerjee
Rajat Kumar Jenamani
Sidharth Vasudev
Amal Nanavati
Katherine Dimitropoulou
Sarah Dean
Tapomayukh Bhattacharjee
83
2
0
11 May 2024
Pose Priors from Language Models
Sanjay Subramanian
Evonne Ng
Lea Müller
Dan Klein
Shiry Ginosar
Trevor Darrell
56
4
0
06 May 2024
CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation
Chenying Liu
C. Albrecht
Yi Wang
Xiao Xiang Zhu
79
2
0
02 May 2024
Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields
Yuhang Huang
SHilong Zou
Xinwang Liu
K. Xu
DiffM
84
0
0
02 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
107
167
0
29 Apr 2024
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Shangzhan Zhang
Sida Peng
Tao Xu
Yuanbo Yang
Tianrun Chen
Nan Xue
Yujun Shen
Hujun Bao
Ruizhen Hu
Xiaowei Zhou
DiffM
38
10
0
26 Apr 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
43
12
0
25 Apr 2024
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
58
19
0
22 Apr 2024
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training
Jiayang Li
Junjun Jiang
Pengwei Liang
Jiayi Ma
Liqiang Nie
49
1
0
17 Apr 2024
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection
Jiaqi Zhu
Shaofeng Cai
Fang Deng
Junran Wu
Junran Wu
74
15
0
15 Apr 2024
How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model
Han Gu
Haoyu Dong
Jichen Yang
Maciej A. Mazurowski
MedIm
VLM
92
14
0
15 Apr 2024
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
Muer Tie
Julong Wei
Zhengjun Wang
Ke Wu
Shansuai Yuan
Kaizhao Zhang
Jie Jia
Jieru Zhao
Zhongxue Gan
Wenchao Ding
47
7
0
10 Apr 2024
Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View
Andreea Dogaru
M. Ozer
Bernhard Egger
3DGS
66
5
0
04 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
78
36
0
29 Mar 2024
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu
Sheng-Yu Huang
Yu-Chiang Frank Wang
64
0
0
25 Mar 2024
Controlled Training Data Generation with Diffusion Models
Teresa Yeo
Andrei Atanov
Harold Benoit
Aleksandr Alekseev
Ruchira Ray
Pooya Esmaeil Akhoondi
Amir Zamir
52
6
0
22 Mar 2024
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
...
Thomas Kollar
Sergey Levine
Chelsea Finn
Sergey Levine
Chelsea Finn
101
197
0
19 Mar 2024
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan
VS Vibashan
Rama Chellappa
Vishal M. Patel
ViT
59
13
0
19 Mar 2024
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Haoyang Liu
Aditya Singh
Yijiang Li
Haohan Wang
AAML
ViT
64
1
0
15 Mar 2024
ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Images
Fangqiang Ding
Yunzhou Zhu
Xiangyu Wen
Gaowen Liu
Chris Xiaoxuan Lu
47
2
0
14 Mar 2024
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang
Xiangtai Li
Henghui Ding
Lu Qi
Jiangning Zhang
Yunhai Tong
Chen Change Loy
Shuicheng Yan
DiffM
76
6
0
14 Mar 2024
Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Jingyun Xue
Tao Wang
Jun Wang
Kaihao Zhang
ViT
53
2
0
09 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
78
12
0
05 Mar 2024
A Simple-but-effective Baseline for Training-free Class-Agnostic Counting
Yuhao Lin
Hai-Ming Xu
Lingqiao Liu
Javen Qinfeng Shi
50
1
0
03 Mar 2024
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
75
2
0
01 Mar 2024
Large Convolutional Model Tuning via Filter Subspace
Wei Chen
Zichen Miao
Qiang Qiu
74
4
0
01 Mar 2024
SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution
Chengcheng Wang
Zhiwei Hao
Yehui Tang
Jianyuan Guo
Yujie Yang
Kai Han
Yunhe Wang
65
6
0
27 Feb 2024
Diffusion Model-Based Image Editing: A Survey
Yi Huang
Jiancheng Huang
Yifan Liu
Mingfu Yan
Jiaxi Lv
Jianzhuang Liu
Wei Xiong
He Zhang
Liangliang Cao
Liangliang Cao
EGVM
85
90
0
27 Feb 2024
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLM
OCL
87
9
0
22 Feb 2024
How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey
Fabio Tosi
Youming Zhang
Ziren Gong
Erik Sandström
S. Mattoccia
Martin R. Oswald
Matteo Poggi
3DGS
108
57
0
20 Feb 2024
Verifiably Following Complex Robot Instructions with Foundation Models
Benedict Quartey
Eric Rosen
Stefanie Tellex
George Konidaris
LM&Ro
59
12
0
18 Feb 2024
Design of 2D Skyrmionic Metamaterial Through Controlled Assembly
Qichen Xu
Zhuanglin Shen
Alexander Edström
I. P. Miranda
Zhiwei Lu
A. Bergman
D. Thonig
Wanjian Yin
Olle Eriksson
Anna Delin
44
0
0
16 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
77
79
0
03 Feb 2024
Segment Any Change
Zhuo Zheng
Yanfei Zhong
Liangpei Zhang
Stefano Ermon
VLM
36
12
0
02 Feb 2024
Previous
1
2
3
...
10
7
8
9
Next