Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.05737
Cited By
v1
v2 (latest)
Follow Anything: Open-set detection, tracking, and following in real-time
10 August 2023
Alaa Maalouf
Ninad Jadhav
Krishna Murthy Jatavallabhula
Makram Chahine
Daniel M.Vogt
Robert J. Wood
Antonio Torralba
Daniela Rus
Re-assign community
ArXiv (abs)
PDF
HTML
Github (380★)
Papers citing
"Follow Anything: Open-set detection, tracking, and following in real-time"
49 / 49 papers shown
Title
MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data
Chika Maduabuchi
Ericmoore Jossou
Matteo Bucci
86
0
0
12 Nov 2024
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models
Tsun-Hsuan Wang
Alaa Maalouf
Wei Xiao
Yutong Ban
Alexander Amini
Guy Rosman
S. Karaman
Daniela Rus
73
45
0
26 Oct 2023
Fast Segment Anything
Xu Zhao
Wen-Yan Ding
Yongqi An
Yinglong Du
Tao Yu
Min Li
Ming Tang
Jinqiao Wang
MLLM
VLM
88
285
0
21 Jun 2023
Segment and Track Anything
Yangming Cheng
Liulei Li
Yuanyou Xu
Xiaodi Li
Zongxin Yang
Wenguan Wang
Yi Yang
VOS
82
202
0
11 May 2023
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
395
7,405
0
05 Apr 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
191
2,023
0
09 Mar 2023
ConceptFusion: Open-set Multimodal 3D Mapping
Krishna Murthy Jatavallabhula
Ali Kuwajerwala
Qiao Gu
Mohd. Omama
Tao Chen
...
Celso Miguel de Melo
Madhava Krishna
Liam Paull
Florian Shkurti
Antonio Torralba
86
246
0
14 Feb 2023
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Joseph Dabis
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
138
1,159
0
13 Dec 2022
Decoupling Features in Hierarchical Propagation for Video Object Segmentation
Zongxin Yang
Yi Yang
VOS
101
157
0
18 Oct 2022
Deep Learning on Home Drone: Searching for the Optimal Architecture
Alaa Maalouf
Yotam Gurfinkel
Barak Diker
O. Gal
Daniela Rus
Dan Feldman
53
5
0
21 Sep 2022
Treating Motion as Option to Reduce Motion Dependency in Unsupervised Video Object Segmentation
Suhwan Cho
Minhyeok Lee
Seung-Hyun Lee
Chaewon Park
Donghyeon Kim
Sangyoun Lee
VOS
121
40
0
04 Sep 2022
Vision-based Anti-UAV Detection and Tracking
Jie Zhao
Jingshu Zhang
Dongdong Li
D. Wang
AI4TS
67
112
0
22 May 2022
Vision-based system for a real-time detection and following of UAV
A. Barišić
Marko Car
Stjepan Bogdan
35
22
0
29 Apr 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
138
381
0
18 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
422
6,921
0
13 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
195
1,988
0
04 Apr 2022
Pre-Trained Language Models for Interactive Decision-Making
Shuang Li
Xavier Puig
Chris Paxton
Yilun Du
Clinton Jia Wang
...
Anima Anandkumar
Jacob Andreas
Igor Mordatch
Antonio Torralba
Yuke Zhu
LM&Ro
112
262
0
03 Feb 2022
Language-driven Semantic Segmentation
Boyi Li
Kilian Q. Weinberger
Serge Belongie
V. Koltun
René Ranftl
VLM
139
628
0
10 Jan 2022
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
126
386
0
22 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
151
580
0
16 Dec 2021
Deep ViT Features as Dense Visual Descriptors
Shirzad Amir
Yossi Gandelsman
Shai Bagon
Tali Dekel
MDE
ViT
134
290
0
10 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
272
2,385
0
02 Dec 2021
A Unified Approach to Coreset Learning
Alaa Maalouf
Gilad Eini
Ben Mussay
Dan Feldman
Margarita Osadchy
DD
62
18
0
04 Nov 2021
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition
Lucas Liebenwein
Alaa Maalouf
O. Gal
Dan Feldman
Daniela Rus
58
47
0
23 Jul 2021
AudioCLIP: Extending CLIP to Image, Text and Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
CLIP
VLM
127
370
0
24 Jun 2021
Associating Objects with Transformers for Video Object Segmentation
Zongxin Yang
Yunchao Wei
Yi Yang
115
293
0
04 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
735
6,139
0
29 Apr 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik
Zongze Wu
Eli Shechtman
Daniel Cohen-Or
Dani Lischinski
CLIP
VLM
138
1,211
0
31 Mar 2021
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Ho Kei Cheng
Yu-Wing Tai
Chi-Keung Tang
VOS
83
202
0
14 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.0K
29,926
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
420
5,005
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
469
3,906
0
11 Feb 2021
CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80
W.-Y. Hong
Chang-Lung Kao
Y.-H. Kuo
J.-R. Wang
Wanxing Chang
C.-S. Shih
56
103
0
23 Dec 2020
Video Object Segmentation with Episodic Graph Memory Networks
Xinkai Lu
Wenguan Wang
Martin Danelljan
Tianfei Zhou
Jianbing Shen
Luc Van Gool
VOS
104
277
0
14 Jul 2020
YOLOv4: Optimal Speed and Accuracy of Object Detection
Alexey Bochkovskiy
Chien-Yao Wang
H. Liao
VLM
ObjD
178
12,317
0
23 Apr 2020
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
98
360
0
21 Apr 2020
See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks
Xiankai Lu
Wenguan Wang
Chao Ma
Jianbing Shen
Ling Shao
Fatih Porikli
VOS
88
464
0
19 Jan 2020
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
Wenguan Wang
Xiankai Lu
Jianbing Shen
David J. Crandall
Ling Shao
VOS
77
274
0
19 Jan 2020
EfficientDet: Scalable and Efficient Object Detection
Mingxing Tan
Ruoming Pang
Quoc V. Le
120
5,076
0
20 Nov 2019
AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates
Ning Liu
Xiaolong Ma
Zhiyuan Xu
Yanzhi Wang
Jian Tang
Jieping Ye
77
186
0
06 Jul 2019
Fast Online Object Tracking and Segmentation: A Unifying Approach
Qiang Wang
Li Zhang
Luca Bertinetto
Weiming Hu
Philip Torr
VOS
95
1,205
0
12 Dec 2018
YOLOv3: An Incremental Improvement
Joseph Redmon
Ali Farhadi
ObjD
136
21,495
0
08 Apr 2018
Open Vocabulary Scene Parsing
Hang Zhao
Xavier Puig
Bolei Zhou
Sanja Fidler
Antonio Torralba
VLM
3DV
96
120
0
26 Mar 2017
R-FCN: Object Detection via Region-based Fully Convolutional Networks
Jifeng Dai
Yi Li
Kaiming He
Jian Sun
ObjD
185
5,650
0
20 May 2016
SSD: Single Shot MultiBox Detector
Wen Liu
Dragomir Anguelov
D. Erhan
Christian Szegedy
Scott E. Reed
Cheng-Yang Fu
Alexander C. Berg
ObjD
BDL
257
29,915
0
08 Dec 2015
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
742
37,085
0
08 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
535
62,477
0
04 Jun 2015
Fast R-CNN
Ross B. Girshick
ObjD
312
25,087
0
30 Apr 2015
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross B. Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
ObjD
295
26,223
0
11 Nov 2013
1