ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLM
    MLLM
ArXivPDFHTML

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 178 papers shown
Title
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
X. Li
Tao Zhang
Lu Qi
Ming Yang
30
0
0
15 Apr 2025
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
Guangcong Zheng
Teng Li
Xianpan Zhou
Xi Li
VGen
3DV
64
1
0
11 Apr 2025
ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation
ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation
Wenqi Guo
Shan Du
VLM
54
0
0
10 Apr 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
40
0
0
10 Apr 2025
Are We Done with Object-Centric Learning?
Are We Done with Object-Centric Learning?
Alexander Rubinstein
Ameya Prabhu
Matthias Bethge
Seong Joon Oh
OCL
617
0
0
09 Apr 2025
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Xiao Zhang
Xiangyu Han
Xiwen Lai
Yao Sun
Pei Zhang
Konrad Kording
34
0
0
08 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
37
0
0
07 Apr 2025
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
Yuandong Pu
Le Zhuo
Kaiwen Zhu
Liangbin Xie
Wenlong Zhang
Xiangyu Chen
Peng Gao
Yu Qiao
Chao Dong
Yihao Liu
MLLM
61
1
0
07 Apr 2025
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation
Junjie Jiang
Zelin Wang
Manqi Zhao
Yin Li
Dongsheng Jiang
41
0
0
06 Apr 2025
MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition
MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition
Takahiro Shirakawa
Tomoyuki Suzuki
Daichi Haraguchi
VGen
39
0
0
03 Apr 2025
BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation
BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation
Van Nguyen Nguyen
Stephen Tyree
Andrew Guo
Mederic Fourmy
Anas Gouda
...
Stan Birchfield
Jiri Matas
Yann Labbé
M. Sundermeyer
Tomás Hodan
3DPC
48
1
0
03 Apr 2025
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Ting Liu
Siyuan Li
44
0
0
01 Apr 2025
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025
Tianming Liang
Haichao Jiang
Wei-Shi Zheng
Jian-Fang Hu
44
0
0
30 Mar 2025
A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery
A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery
Pengyu Chen
Sicheng Wang
Cuizhen Wang
Senrong Wang
Beiao Huang
Lu Huang
Zhe Zang
32
0
0
29 Mar 2025
Segment Any Motion in Videos
Segment Any Motion in Videos
Nan Huang
Wenzhao Zheng
Chenfeng Xu
Kurt Keutzer
Shanghang Zhang
Angjoo Kanazawa
Qianqian Wang
VOS
53
0
0
28 Mar 2025
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou
Hui Ren
Yijia Weng
Shuwang Zhang
Zhen Wang
...
Zhiwen Fan
Suya You
Z. Wang
Leonidas J. Guibas
A. Kadambi
VGen
3DGS
85
0
0
26 Mar 2025
A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees
A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees
Faseeh Ahmad
Hashim Ismail
Jonathan Styrud
Maj Stenmark
Volker Krueger
41
0
0
19 Mar 2025
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Nvidia
Hassan Abu Alhaija
Jose M. Alvarez
Maciej Bala
Tiffany Cai
...
Yuchong Ye
Xiaodong Yang
X. Yang
Xiaohui Zeng
Yu Zeng
VGen
90
1
0
18 Mar 2025
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Dewei Zhou
Mingwei Li
Zongxin Yang
Yi Yang
94
0
0
17 Mar 2025
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
Yuheng Jiang
Zhehao Shen
Chengcheng Guo
Yu Hong
Zhuo Su
Y. Zhang
Marc Habermann
Lan Xu
59
1
0
15 Mar 2025
PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
H. Iqbal
Nazmul Karim
Umar Khalid
Azib Farooq
Z. Zhong
Jing Hua
Chen Chen
DiffM
3DGS
VGen
47
0
0
14 Mar 2025
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Yancheng Cai
Fei Yin
Dounia Hammou
Rafal Mantiuk
VLM
Presented at ResearchTrend Connect | VLM on 14 Mar 2025
140
1
0
13 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
56
0
0
13 Mar 2025
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Yufan Deng
Xun Guo
Y. Wang
Jacob Zhiyuan Fang
Angtian Wang
Shenghai Yuan
Yiding Yang
Bo Liu
Haibin Huang
Chongyang Ma
DiffM
VGen
64
0
0
13 Mar 2025
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger
Snehal Jauhri
V. Prasad
Georgia Chalvatzaki
60
0
0
12 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
145
0
0
11 Mar 2025
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo
Jie Hu
Yansong Qu
Liujuan Cao
3DGS
143
0
0
11 Mar 2025
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation
Chenfei Liao
Xu Zheng
Yuanhuiyi Lyu
Haiwei Xue
Yihong Cao
Jiawen Wang
Kailun Yang
Xuming Hu
VLM
53
3
0
09 Mar 2025
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Jiaming Liu
Linghe Kong
Guihai Chen
68
0
0
08 Mar 2025
GBT-SAM: Adapting a Foundational Deep Learning Model for Generalizable Brain Tumor Segmentation via Efficient Integration of Multi-Parametric MRI Data
GBT-SAM: Adapting a Foundational Deep Learning Model for Generalizable Brain Tumor Segmentation via Efficient Integration of Multi-Parametric MRI Data
Cecilia Diana-Albelda
Roberto Alcover-Couso
Álvaro García-Martín
Jesús Bescós
Marcos Escudero-Viñolo
42
1
0
06 Mar 2025
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
Hongjie Fang
Chenxi Wang
Yiming Wang
J. Chen
Shangning Xia
...
Xinyu Zhan
Lixin Yang
Weiming Wang
Cewu Lu
Hao-Shu Fang
82
1
0
05 Mar 2025
Boltzmann Attention Sampling for Image Analysis with Small Objects
Boltzmann Attention Sampling for Image Analysis with Small Objects
Theodore Zhao
Sid Kiblawi
Naoto Usuyama
Ho Hin Lee
Sam Preston
Hoifung Poon
Mu-Hsin Wei
MedIm
73
0
0
04 Mar 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
69
1
0
03 Mar 2025
Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups
Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups
Nicholas Pfaff
Evelyn Fu
Jeremy Binagia
Phillip Isola
Russ Tedrake
49
4
0
01 Mar 2025
Less is More? Revisiting the Importance of Frame Rate in Real-Time Zero-Shot Surgical Video Segmentation
Less is More? Revisiting the Importance of Frame Rate in Real-Time Zero-Shot Surgical Video Segmentation
Utku Ozbulak
Seyed Amir Mousavi
Francesca Tozzi
Nikdokht Rashidian
W. Willaert
W. D. Neve
J. Vankerschaver
42
0
0
28 Feb 2025
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
Otto Brookes
Maksim Kukushkin
Majid Mirmehdi
Colleen Stephens
Paula Dieguez
...
Lukas Boesch
Thomas Schmid
M. Arandjelovic
H. Kühl
T. Burghardt
46
0
0
28 Feb 2025
Best Foot Forward: Robust Foot Reconstruction in-the-wild
Best Foot Forward: Robust Foot Reconstruction in-the-wild
Kyle Fogarty
Jing Yang
Chayan Kumar Patodi
Aadi Bhanti
Steven Chacko
Cengiz Öztireli
Ujwal Bonde
56
0
0
27 Feb 2025
Vector-Quantized Vision Foundation Models for Object-Centric Learning
Vector-Quantized Vision Foundation Models for Object-Centric Learning
Rongzhen Zhao
V. Wang
Juho Kannala
J. Pajarinen
OCL
VLM
185
0
0
27 Feb 2025
TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis
TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis
Menghao Li
Zhenghao Zhang
Junchao Liao
Long Qin
Weizhi Wang
DiffM
VGen
64
0
0
26 Feb 2025
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Kaixin Yao
Longwen Zhang
Xinhao Yan
Yan Zeng
Qixuan Zhang
Wei Yang
Lan Xu
Jiayuan Gu
Jingyi Yu
29
3
0
18 Feb 2025
Bilevel Learning for Bilevel Planning
Bilevel Learning for Bilevel Planning
Bowen Li
Tom Silver
Sebastian A. Scherer
Alexander G. Gray
68
1
0
12 Feb 2025
Imit Diff: Semantics Guided Diffusion Transformer with Dual Resolution Fusion for Imitation Learning
Imit Diff: Semantics Guided Diffusion Transformer with Dual Resolution Fusion for Imitation Learning
Yuhang Dong
Haizhou Ge
Yupei Zeng
J. Zhang
Beiwen Tian
...
Yufei Jia
Ruixiang Wang
Ran Yi
Guyue Zhou
Longhua Ma
51
0
0
11 Feb 2025
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Li Hu
Guangyuan Wang
Zhen Shen
Xin Gao
Dechao Meng
Lian Zhuo
Peng Zhang
Bang Zhang
Liefeng Bo
DiffM
VGen
93
8
0
10 Feb 2025
Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform
Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
3DGS
3DPC
AI4CE
59
1
0
09 Feb 2025
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
Mennatullah Siam
VLM
76
1
0
06 Feb 2025
DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models
DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models
Lingshun Kong
Jiawei Zhang
Dongqing Zou
Jimmy S. J. Ren
Xiaohe Wu
Jiangxin Dong
Jinshan Pan
DiffM
85
0
0
06 Feb 2025
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
Jinbo Xing
Long Mai
Cusuh Ham
Jiahui Huang
Aniruddha Mahapatra
Chi-Wing Fu
T. Wong
Feng Liu
DiffM
VGen
124
2
0
06 Feb 2025
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
Tongkun Liu
Bing Li
Xiao Jin
Yupeng Shi
Qiuying Li
Xiang Wei
57
0
0
03 Feb 2025
Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors
Zhiyuan Lu
Hao Lu
Hua Huang
114
0
0
28 Jan 2025
MADation: Face Morphing Attack Detection with Foundation Models
MADation: Face Morphing Attack Detection with Foundation Models
Eduarda Caldeira
Guray Ozgur
Tahar Chettaoui
Marija Ivanovska
Peter Peer
Fadi Boutros
Vitomir Štruc
Naser Damer
CVBM
39
1
1
28 Jan 2025
Previous
1234
Next