Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.00714
Cited By
SAM 2: Segment Anything in Images and Videos
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SAM 2: Segment Anything in Images and Videos"
50 / 189 papers shown
Title
Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
3DGS
3DPC
AI4CE
63
1
0
09 Feb 2025
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
Jinbo Xing
Long Mai
Cusuh Ham
Jiahui Huang
Aniruddha Mahapatra
Chi-Wing Fu
T. Wong
Feng Liu
DiffM
VGen
130
2
0
06 Feb 2025
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
Mennatullah Siam
VLM
84
1
0
06 Feb 2025
DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models
Lingshun Kong
Jiawei Zhang
Dongqing Zou
Jimmy S. J. Ren
Xiaohe Wu
Jiangxin Dong
Jinshan Pan
DiffM
87
0
0
06 Feb 2025
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
Tongkun Liu
Bing Li
Xiao Jin
Yupeng Shi
Qiuying Li
Xiang Wei
64
0
0
03 Feb 2025
Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors
Zhiyuan Lu
Hao Lu
Hua Huang
153
0
0
28 Jan 2025
MADation: Face Morphing Attack Detection with Foundation Models
Eduarda Caldeira
Guray Ozgur
Tahar Chettaoui
Marija Ivanovska
Peter Peer
Fadi Boutros
Vitomir Štruc
Naser Damer
CVBM
47
1
1
28 Jan 2025
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong
Meng Lan
Qian Zhang
Lefei Zhang
VOS
VGen
73
1
0
23 Jan 2025
Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos
Xianrui Luo
Juewen Peng
Zhongang Cai
Lei Yang
Fan Yang
Zhiguo Cao
Guosheng Lin
VGen
192
0
0
23 Jan 2025
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
Kaiyu Li
Xiangyong Cao
Yupeng Deng
Chao Pang
Zepeng Xin
Deyu Meng
Zhi Wang
ObjD
69
1
0
22 Jan 2025
Adapting OpenAI's CLIP Model for Few-Shot Image Inspection in Manufacturing Quality Control: An Expository Case Study with Multiple Application Examples
F. Megahed
Ying-Ju Chen
B. Colosimo
M. Grasso
L. Allison Jones-Farmer
S. Knoth
Hongyue Sun
I. Zwetsloot
AAML
VLM
74
0
0
22 Jan 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
91
22
0
21 Jan 2025
Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car Reconstruction
Congcong Li
Jin Wang
Xiaomeng Wang
Xingchen Zhou
Wei Wu
Yuzhi Zhang
Tongyi Cao
3DGS
3DV
108
0
0
19 Jan 2025
Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation
Xingxin He
Yifan Hu
Zhaoye Zhou
Mohamed Jarraya
Fang Liu
VLM
MedIm
47
2
0
17 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
159
2
0
14 Jan 2025
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Juntao Ren
Priya Sundaresan
Dorsa Sadigh
Sanjiban Choudhury
Jeannette Bohg
37
15
0
13 Jan 2025
Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation
Zhenyang Feng
Zihe Wang
Saul Ibaven Bueno
Tomasz Frelek
Advikaa Ramesh
...
Hilmar Lapp
Charles V. Stewart
T. Berger-Wolf
Yu-Chuan Su
Wei-Lun Chao
53
0
0
12 Jan 2025
Zero-shot Shark Tracking and Biometrics from Aerial Imagery
Chinmay K Lalgudi
Mark E Leone
Jaden V Clark
Sergio Madrigal-Mora
Mario Espinoza
47
0
0
10 Jan 2025
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Yuanpeng Tu
Hao Luo
Xi Chen
S. Ji
Xiang Bai
Hengshuang Zhao
VGen
DiffM
42
3
0
08 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
96
12
0
07 Jan 2025
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Wen-Dong Jiang
Chih-Yung Chang
Diptendu Sinha Roy
40
0
0
07 Jan 2025
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan
Hang Zhang
Wentong Li
Zesen Cheng
Boqiang Zhang
...
Deli Zhao
Wenqiao Zhang
Yueting Zhuang
Jianke Zhu
Lidong Bing
77
5
0
31 Dec 2024
BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization
Jiayi Chen
Yubin Ke
H. Wang
89
5
0
21 Dec 2024
Measurement of Medial Elbow Joint Space using Landmark Detection
Shizuka Akahori
Shotaro Teruya
Pragyan Shrestha
Yuichi Yoshii
Ryuhei Michinobu
S. Iizuka
I. Kitahara
78
0
0
17 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGen
AI4CE
105
3
0
16 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
101
5
0
05 Dec 2024
Referring Video Object Segmentation via Language-aligned Track Selection
Seongchan Kim
Woojeong Jin
Sangbeom Lim
Heeji Yoon
Hyunwook Choi
Seungryong Kim
VOS
94
0
0
02 Dec 2024
T-3DGS: Removing Transient Objects for 3D Scene Reconstruction
Vadim Pryadilshchikov
Alexander Markin
Artem Komarichev
Ruslan Rakhimov
Peter Wonka
Evgeny Burnaev
3DGS
81
1
0
29 Nov 2024
VideoDirector: Precise Video Editing via Text-to-Video Models
Yukun Wang
Longguang Wang
Zhiyuan Ma
Qibin Hu
Kai Xu
Yulan Guo
VGen
DiffM
86
0
0
26 Nov 2024
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann
Yannick Wattenberg
Tamaz Amiranashvili
Suprosanna Shit
Bjoern H. Menze
86
3
0
26 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
106
2
0
26 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
96
6
0
25 Nov 2024
Generative Omnimatte: Learning to Decompose Video into Layers
Yao-Chih Lee
Erika Lu
Sarah Rumbley
Michal Geyer
Jia-Bin Huang
Tali Dekel
Forrester Cole
DiffM
VGen
105
5
0
25 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
109
1
0
25 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Mian
Joey Tianyi Zhou
Chen Chen
LRM
68
1
0
15 Nov 2024
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
Fahad Shahbaz Khan
Salman Khan
MLLM
VGen
VLM
44
6
0
07 Nov 2024
ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy
Chenrui Tie
Yue Chen
Ruihai Wu
Boxuan Dong
Zhiyu Li
Chongkai Gao
Hao Dong
51
3
0
06 Nov 2024
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Sunjae Yoon
Gwanhyeong Koo
Younghwan Lee
Chang D. Yoo
VGen
74
3
0
31 Oct 2024
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
Sekeun Kim
Pengfei Jin
S. Song
Cheng Chen
Yiwei Li
Hui Ren
Xiang Li
Tianming Liu
Quanzheng Li
39
0
0
30 Oct 2024
MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis
Di Qiu
Zheng Chen
Rui Wang
Mingyuan Fan
Changqian Yu
Junshi Huan
Xiang Wen
VGen
40
6
0
28 Oct 2024
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng Xu
Nick Barnes
Fahad Shahbaz Khan
Salman Khan
Deng-Ping Fan
49
4
0
22 Oct 2024
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
Yankai Jiang
Wenhui Lei
Xiaofan Zhang
S. Zhang
MedIm
40
2
0
21 Oct 2024
GRS: Generating Robotic Simulation Tasks from Real-World Images
Alex Zook
Fan-Yun Sun
Josef Spjut
Valts Blukis
Stan Birchfield
Jonathan Tremblay
60
4
0
20 Oct 2024
LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes
Juliette Marrie
Romain Menegaux
Michael Arbel
Diane Larlus
Julien Mairal
3DGS
41
1
0
18 Oct 2024
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu
Shengcao Cao
Yu-xiong Wang
55
1
0
18 Oct 2024
In-Context Learning Enables Robot Action Prediction in LLMs
Yida Yin
Zekai Wang
Yuvan Sharma
Dantong Niu
Trevor Darrell
Roei Herzig
LM&Ro
117
1
0
16 Oct 2024
Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
Zhijie Yan
Shufei Li
Zihan Wang
Lixiu Wu
Han Wang
Jun Zhu
Lijiang Chen
Jihong Liu
39
1
0
15 Oct 2024
Browsing without Third-Party Cookies: What Do You See?
Maxwell Lin
Shihan Lin
Helen Wu
Karen Wang
Xiaowei Yang
BDL
56
0
0
14 Oct 2024
ROMAN: Open-Set Object Map Alignment for Robust View-Invariant Global Localization
Mason B. Peterson
Yi Xuan Jia
Yulun Tian
Annika Thomas
Jonathan P. How
58
2
0
10 Oct 2024
Towards Generalisable Time Series Understanding Across Domains
Özgün Turgut
Philip Muller
M. Menten
Daniel Rueckert
AI4TS
51
1
0
09 Oct 2024
Previous
1
2
3
4
Next