Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.02643
Cited By
Segment Anything
5 April 2023
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
Laura Gustafson
Tete Xiao
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Segment Anything"
50 / 4,194 papers shown
Title
Towards Learning to Complete Anything in Lidar
Ayca Takmaz
Cristiano Saltori
Neehar Peri
Tim Meinhardt
Riccardo de Lutio
Laura Leal-Taixé
Aljosa Osep
3DV
VLM
53
0
0
16 Apr 2025
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Jinfeng Xu
Yuanmin Huang
Baoqi Pei
Junlin Hou
Qingqiu Li
Guo Chen
Yuhui Zhang
Rui Feng
Weidi Xie
DiffM
58
1
0
16 Apr 2025
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Lvpan Cai
Haowei Wang
Jiayi Ji
YanShu ZhouMen
Yiwei Ma
Xiaoshuai Sun
Liujuan Cao
Rongrong Ji
ViT
43
0
0
16 Apr 2025
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
Mengshi Qi
Pengfei Zhu
Xianrui Li
Xiaoyang Bi
Lu Qi
Huadong Ma
Ming Yang
VOS
VLM
55
0
0
16 Apr 2025
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting
Wei Sun
Yanzhao Zhou
Jianbin Jiao
Yuan Li
3DGS
50
0
0
16 Apr 2025
Real-World Depth Recovery via Structure Uncertainty Modeling and Inaccurate GT Depth Fitting
Delong Suzhang
Meng Yang
32
0
0
16 Apr 2025
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections
Alireza Salehi
Mohammadreza Salehi
Reshad Hosseini
Cees G. M. Snoek
Makoto Yamada
Mohammad Sabokrou
VLM
38
0
0
15 Apr 2025
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Hyunwoo Oh
Yezi Liu
Tamoghno Das
Mohsen Imani
VLM
LRM
39
0
0
15 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
70
6
0
15 Apr 2025
Deep Learning in Concealed Dense Prediction
Pancheng Zhao
Deng-Ping Fan
Shupeng Cheng
Salman Khan
Fahad Shahbaz Khan
David Clifton
Peng Xu
Jufeng Yang
VLM
32
0
0
15 Apr 2025
MediSee: Reasoning-based Pixel-level Perception in Medical Images
Qinyue Tong
Ziqian Lu
Jun Liu
Yangming Zheng
Zheming Lu
LRM
43
0
0
15 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
Roger Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
144
1
0
15 Apr 2025
FACT: Foundation Model for Assessing Cancer Tissue Margins with Mass Spectrometry
Mohammad Farahmand
A. Jamzad
Fahimeh Fooladgar
Laura Connolly
Martin Kaufmann
Kevin Yi Mi Ren
John Rudan
Doug McKay
Gabor Fichtinger
P. Mousavi
48
0
0
15 Apr 2025
Explicit and Implicit Representations in AI-based 3D Reconstruction for Radiology: A Systematic Review
Yuezhe Yang
Boyu Yang
Yaqian Wang
Yang He
Xingbo Dong
Zhe Jin
52
0
0
15 Apr 2025
Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation
Andrea Simonelli
Norman Muller
Peter Kontschieder
26
0
0
15 Apr 2025
PT-Mark: Invisible Watermarking for Text-to-image Diffusion Models via Semantic-aware Pivotal Tuning
Yansen Wang
Huiyu Xu
Peng Kuang
Jiacheng Du
Zehan Li
Yiming Li
Qiu Wang
Kui Ren
WIGM
62
0
0
15 Apr 2025
TT3D: Table Tennis 3D Reconstruction
Thomas Gossard
Andreas Ziegler
A. Zell
33
0
0
14 Apr 2025
Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials
J. Yang
Ruoyan Avery Yin
Chi Jiang
Yuepeng Hu
X. Zhu
...
Zongyou Yin
Jing Kong
Neil Zhenqiang Gong
Z. Z. Ren
Haozhe Wang
31
0
0
14 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
Xuelong Li
Zilong Huang
Yuchen Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLM
LRM
70
2
0
14 Apr 2025
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
Kaiyu Li
Zepeng Xin
Li Pang
Chao Pang
Yupeng Deng
Jing Yao
Guisong Xia
Deyu Meng
Zhi Wang
Xiangyong Cao
VLM
LRM
42
0
0
13 Apr 2025
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Haotian Xu
Yue Hu
Chen Gao
Zhengqiu Zhu
Yong Zhao
Yong Li
Quanjun Yin
44
0
0
13 Apr 2025
ToolTipNet: A Segmentation-Driven Deep Learning Baseline for Surgical Instrument Tip Detection
Zijian Wu
Shuojue Yang
Yueming Jin
Septimiu E. Salcudean
MedIm
46
1
0
13 Apr 2025
Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation
Jia Wei
Xiaoqi Zhao
Jonghye Woo
Georges El Fakhri
Xiaofeng Liu
Qingyu Chen
Xiaofeng Liu
27
0
0
13 Apr 2025
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu
Xucheng Wang
Xiangyang Yang
Mengyuan Liu
Dan Zeng
Hengzhou Ye
Shuiwang Li
34
0
0
12 Apr 2025
Visual moral inference and communication
Warren Zhu
Aida Ramezani
Yang Xu
38
0
0
12 Apr 2025
PathSeqSAM: Sequential Modeling for Pathology Image Segmentation with SAM2
Mingyang Zhu
Yinting Liu
Mingyu Li
Jiacheng Wang
21
0
0
12 Apr 2025
DoorBot: Closed-Loop Task Planning and Manipulation for Door Opening in the Wild with Haptic Feedback
Zhi Wang
Yuchen Mo
Shengmiao Jin
Wenzhen Yuan
42
1
0
12 Apr 2025
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
Jialu Li
Shoubin Yu
Han Lin
Jaemin Cho
Jaehong Yoon
Joey Tianyi Zhou
DiffM
VGen
60
0
0
11 Apr 2025
DreamFuse: Adaptive Image Fusion with Diffusion Transformer
Junjia Huang
Pengxiang Yan
Jiyang Liu
Jie Wu
Zhao Wang
Yitong Wang
Liang Lin
G. Li
40
0
0
11 Apr 2025
FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment
Sebastián Barbas Laina
Simon Boche
Sotiris Papatheodorou
Simon Schaefer
Jaehyung Jung
Stefan Leutenegger
57
0
0
11 Apr 2025
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
ViT
31
0
0
11 Apr 2025
Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation
Bram Vanherle
Brent Zoomers
Jeroen Put
F. Reeth
Nick Michiels
3DGS
39
0
0
11 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
237
0
0
11 Apr 2025
Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models
Jiahuan Long
Tingsong Jiang
Wen Yao
Yizhe Xiong
Zhengqin Xu
Shuai Jia
Chao Ma
29
0
0
11 Apr 2025
SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data
Sourya Sengupta
Satrajit Chakrabarty
Keerthi Sravan Ravi
Gopal Avinash
Ravi Soni
MedIm
36
0
0
11 Apr 2025
FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents
Xin Tan
Yuzhou Ji
He Zhu
Yuan Xie
3DGS
39
0
0
11 Apr 2025
CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model
Ruohao Zhan
Yijin Li
Yisheng He
Shuo Chen
Yichen Shen
Xinyu Chen
Zilong Dong
Zhaoyang Huang
Guofeng Zhang
DiffM
54
0
0
11 Apr 2025
Adversarial Examples in Environment Perception for Automated Driving (Review)
Jun Yan
Huilin Yin
AAML
36
0
0
11 Apr 2025
Diffusion Models for Robotic Manipulation: A Survey
Rosa Wolf
Yitian Shi
Sheng Liu
Rania Rayyes
54
2
0
11 Apr 2025
Robust SAM: On the Adversarial Robustness of Vision Foundation Models
Jiahuan Long
Zhengqin Xu
Tingsong Jiang
Wen Yao
Shuai Jia
Chao Ma
Xiaoqian Chen
AAML
VLM
39
1
0
11 Apr 2025
On Background Bias of Post-Hoc Concept Embeddings in Computer Vision DNNs
Gesina Schwalbe
Georgii Mikriukov
Edgar Heinert
Stavros Gerolymatos
Mert Keser
Alois Knoll
Matthias Rottmann
Annika Mütze
36
0
0
11 Apr 2025
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena
Tommaso Apicella
Stefano Rosa
Pietro Morerio
Alessio Del Bue
Lorenzo Natale
39
0
0
11 Apr 2025
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Jiayang Sun
Han Wang
Jie Cao
Huaibo Huang
Ran He
DiffM
78
0
0
10 Apr 2025
Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying
Shichen Li
Chenhui Shao
46
2
0
10 Apr 2025
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding
Henghao Zhao
Ge-Peng Ji
Rui Yan
Huan Xiong
Zechao Li
29
0
0
10 Apr 2025
MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation
Nico Catalano
Stefano Samele
Paolo Pertino
Matteo Matteucci
3DPC
53
0
0
10 Apr 2025
HoloPart: Generative 3D Part Amodal Segmentation
Yanting Yang
Yu Guo
Yukun Huang
Zi-Xin Zou
Zhipeng Yu
Yangguang Li
Yan-Pei Cao
Xihui Liu
DiffM
50
1
0
10 Apr 2025
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
Linyan Huang
Haonan Lin
Yanning Zhou
Kaiwen Xiao
47
0
0
10 Apr 2025
Towards Unconstrained 2D Pose Estimation of the Human Spine
Muhammad Gul Zain Ali Khan
Stephan Krauß
Didier Stricker
3DH
61
0
0
10 Apr 2025
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration
Omar Alama
A. Bhattacharya
Haoyang He
Seungchan Kim
Yuheng Qiu
Wenshan Wang
Cherie Ho
Nikhil Varma Keetha
Sebastian A. Scherer
33
0
0
09 Apr 2025
Previous
1
2
3
4
5
6
...
82
83
84
Next