ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLM
    MLLM
ArXivPDFHTML

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 175 papers shown
Title
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
Haofeng Liu
Mingqi Gao
Xuxiao Luo
Ziyue Wang
Guanyi Qin
J. Wu
Yueming Jin
37
0
0
13 May 2025
Symbolically-Guided Visual Plan Inference from Uncurated Video Data
Symbolically-Guided Visual Plan Inference from Uncurated Video Data
Wenyan Yang
Ahmet Tikna
Yi Zhao
Yuying Zhang
Luigi Palopoli
Marco Roveri
J. Pajarinen
VGen
24
0
0
13 May 2025
When Dance Video Archives Challenge Computer Vision
When Dance Video Archives Challenge Computer Vision
P. Colantoni
Rafique Ahmed
Prashant Ghimire
Damien Muselet
A. Trémeau
3DH
26
0
0
12 May 2025
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
M. Ronecker
Matthew Foutter
Amine Elhafsi
Daniele Gammelli
Ihor Barakaiev
Marco Pavone
Daniel Watzenig
26
0
0
12 May 2025
ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation
ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation
Feng Yuan
Yifan Gao
Wenbin Wu
Keqing Wu
Xiaotong Guo
Jie Jiang
Xin Gao
Mamba
46
0
0
12 May 2025
The First WARA Robotics Mobile Manipulation Challenge -- Lessons Learned
The First WARA Robotics Mobile Manipulation Challenge -- Lessons Learned
David Cáceres-Domínguez
M. Iannotta
Abhishek Kashyap
Shuo Sun
Yuxuan Yang
...
Zheng Jia
Graziano Carriero
Sofia Lindqvist
Silvio Di Castro
Matteo Iovino
28
0
0
11 May 2025
Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers
Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers
Chi Xu
Yili Jin
Sami Ma
Rongsheng Qian
Hao Fang
...
Xue Liu
Edith Ngai
William I. Atlas
Katrina M. Connors
Mark A. Spoljaric
26
0
0
10 May 2025
Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
Zechu Li
Yufeng Jin
Daniel Felipe Ordoñez Apraez
Claudio Semini
Puze Liu
Georgia Chalvatzaki
136
0
0
08 May 2025
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
T. Kaiser
Thomas Norrenbrock
Bodo Rosenhahn
48
0
0
08 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Y. Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
Sameer Malik
Moyuru Yamada
Ayush Singh
Dishank Aggarwal
124
0
0
06 May 2025
Visual Imitation Enables Contextual Humanoid Control
Visual Imitation Enables Contextual Humanoid Control
Arthur Allshire
Hongsuk Choi
Junyi Zhang
David McAllister
Anthony Zhang
C. Kim
Trevor Darrell
Pieter Abbeel
Jitendra Malik
Angjoo Kanazawa
LM&Ro
109
0
0
06 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
47
0
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Yankai Jiang
Peng Zhang
D. Yang
Yuan Tian
Hai Lin
X. Wang
MedIm
113
0
0
05 May 2025
6D Pose Estimation on Spoons and Hands
6D Pose Estimation on Spoons and Hands
Kevin Tan
Fan Yang
Y. Chen
42
0
0
05 May 2025
SignSplat: Rendering Sign Language via Gaussian Splatting
SignSplat: Rendering Sign Language via Gaussian Splatting
Maksym Ivashechkin
Oscar Mendez
Richard Bowden
3DGS
43
0
0
04 May 2025
Segment Any RGB-Thermal Model with Language-aided Distillation
Segment Any RGB-Thermal Model with Language-aided Distillation
Dong Xing
Xianxun Zhu
Wei Zhou
Qika Lin
Hang Yang
Yuqing Wang
VLM
56
0
0
04 May 2025
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
111
0
0
04 May 2025
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
M. Tomizuka
Xue Yang
Junchi Yan
Mingyu Ding
LM&Ro
VLM
64
2
0
04 May 2025
Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning
Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning
Malte Mosbach
Sven Behnke
31
0
0
04 May 2025
Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2
Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2
Yuwen Chen
Zafer Yildiz
Qihang Li
Yaqian Chen
Haoyu Dong
Hanxue Gu
N. Konz
Maciej Mazurowski
MedIm
VLM
38
0
0
03 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
54
0
0
03 May 2025
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging
Elena Mulero Ayllón
Massimiliano Mantegna
Linlin Shen
Paolo Soda
V. Guarrasi
M. Tortora
33
0
0
02 May 2025
Improving Editability in Image Generation with Layer-wise Memory
Improving Editability in Image Generation with Layer-wise Memory
Daneul Kim
Jaeah Lee
Jaesik Park
DiffM
KELM
55
0
0
02 May 2025
Implicit Neural-Representation Learning for Elastic Deformable-Object Manipulations
Implicit Neural-Representation Learning for Elastic Deformable-Object Manipulations
Minseok Song
JeongHo Ha
Bonggyeong Park
Daehyung Park
114
0
0
01 May 2025
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs
Pranav Saxena
Nishant Raghuvanshi
Neena Goveas
69
0
0
30 Apr 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Y. Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Y. Yang
Qi Zhang
Lili Qiu
VLM
81
0
0
30 Apr 2025
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
Linshan Wu
Yuxiang Nie
Sunan He
Jiaxin Zhuang
Hao Chen
LM&MA
MedIm
73
0
0
30 Apr 2025
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
Jingfeng Guo
J. Chen
Weikai Chen
Zhenyu Sun
Lanjiong Li
Baozhu Zhao
Lingting Zhu
X. Wang
Qi Liu
3DH
80
0
0
29 Apr 2025
Hydra: Marker-Free RGB-D Hand-Eye Calibration
Hydra: Marker-Free RGB-D Hand-Eye Calibration
Martin Huber
Huanyu Tian
Christopher E. Mower
Lucas-Raphael Müller
Sebastien Ourselin
Christos Bergeles
Tom Vercauteren
36
0
0
29 Apr 2025
PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking
PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking
Xiatao Sun
Yinxing Chen
Daniel Rakita
VGen
53
0
0
29 Apr 2025
Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization
Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization
Jiayi Chen
Shuai Wang
Guoliang Li
Wei Xu
Guangxu Zhu
Derrick Wing Kwan Ng
Chengzhong Xu
53
0
0
25 Apr 2025
Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning
Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning
Yuanbing Ouyang
Yizhuo Liang
Qingpeng Li
Xinfei Guo
Yiming Luo
Di Wu
Hao Wang
Yushan Pan
ViT
VLM
71
0
0
25 Apr 2025
PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models
PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models
Michel Gokan Khan
Renan Guarese
Fabian Johnson
Xi Vincent Wang
Anders Bergman
Benjamin Edvinsson
Mario Romero
Jérémy Vachier
Jan Kronqvist
3DGS
57
0
0
25 Apr 2025
Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing
S. Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
X. Zhang
Gang Yu
Daxin Jiang
DiffM
100
3
0
24 Apr 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
Wenxuan Li
Hang Zhao
Zhiyuan Yu
Yu Du
Qin Zou
Ruizhen Hu
K. Xu
SSL
76
1
0
23 Apr 2025
Model-based Metric 3D Shape and Motion Reconstruction of Wild Bottlenose Dolphins in Drone-Shot Videos
Model-based Metric 3D Shape and Motion Reconstruction of Wild Bottlenose Dolphins in Drone-Shot Videos
Daniele Baieri
Riccardo Cicciarella
Michael Krützen
Emanuele Rodolà
Silvia Zuffi
38
0
0
22 Apr 2025
AffordanceSAM: Segment Anything Once More in Affordance Grounding
AffordanceSAM: Segment Anything Once More in Affordance Grounding
D. Jiang
Mengmeng Wang
Teli Ma
H. Li
Y. Liu
Guang Dai
L. Zhang
32
0
0
22 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Tyler Ga Wei Lum
Olivia Y. Lee
C. Karen Liu
Jeannette Bohg
33
1
0
17 Apr 2025
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results
Xin Li
Kun Yuan
B. Li
Fengbin Guan
Yizhen Shao
...
Guohua Zhang
Z. Huang
Y. Deng
Qingmiao Jiang
Lu Chen
53
7
0
17 Apr 2025
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu
J. Zhang
Minghao Guo
Youpeng Wen
H. Yang
...
Liqiong Wang
Yuxuan Kuang
Meng Cao
Feng Zheng
Xiaodan Liang
42
3
0
17 Apr 2025
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Lvpan Cai
Haowei Wang
Jiayi Ji
YanShu ZhouMen
Yiwei Ma
Xiaoshuai Sun
Liujuan Cao
Rongrong Ji
ViT
34
0
0
16 Apr 2025
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
Mengshi Qi
Pengfei Zhu
X. Li
Xiaoyang Bi
Lu Qi
Huadong Ma
Ming Yang
VOS
VLM
42
0
0
16 Apr 2025
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
Aditya Prakash
Benjamin Lundell
Dmitry Andreychuk
David Forsyth
Saurabh Gupta
H. Sawhney
31
0
0
16 Apr 2025
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
Jingshun Huang
Haitao Lin
Tianyu Wang
Yanwei Fu
Xiangyang Xue
Y. X. Zhu
3DPC
37
0
0
15 Apr 2025
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
X. Li
Tao Zhang
Lu Qi
Ming Yang
28
0
0
15 Apr 2025
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
Guangcong Zheng
Teng Li
Xianpan Zhou
Xi Li
VGen
3DV
62
1
0
11 Apr 2025
ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation
ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation
Wenqi Guo
Shan Du
VLM
52
0
0
10 Apr 2025
1234
Next