Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.16156
Cited By
v1
v2 (latest)
VideoOrion: Tokenizing Object Dynamics in Videos
25 November 2024
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VideoOrion: Tokenizing Object Dynamics in Videos"
17 / 67 papers shown
Title
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
403
1,114
0
13 Oct 2021
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Yifu Zhang
Pei Sun
Yi Jiang
Dongdong Yu
Fucheng Weng
Zehuan Yuan
Ping Luo
Wenyu Liu
Xinggang Wang
VOT
176
1,391
0
13 Oct 2021
Associating Objects with Transformers for Video Object Segmentation
Zongxin Yang
Yunchao Wei
Yi Yang
89
292
0
04 Jun 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
981
29,871
0
26 Feb 2021
TrackFormer: Multi-Object Tracking with Transformers
Tim Meinhardt
A. Kirillov
Laura Leal-Taixe
Christoph Feichtenhofer
VOT
276
774
0
07 Jan 2021
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Zongheng Tang
Yue Liao
Si Liu
Guanbin Li
Xiaojie Jin
Hongxu Jiang
Qian Yu
Dong Xu
66
99
0
10 Nov 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
440
13,130
0
26 May 2020
Designing Network Design Spaces
Ilija Radosavovic
Raj Prateek Kosaraju
Ross B. Girshick
Kaiming He
Piotr Dollár
GNN
102
1,693
0
30 Mar 2020
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Zhenfang Chen
Lin Ma
Wenhan Luo
Kwan-Yee K. Wong
95
103
0
06 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
115
474
0
06 Jun 2019
Fast Online Object Tracking and Segmentation: A Unifying Approach
Qiang Wang
Li Zhang
Luca Bertinetto
Weiming Hu
Philip Torr
VOS
95
1,205
0
12 Dec 2018
LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking
Heng Fan
Liting Lin
Fan Yang
Peng Chu
Ge Deng
Sijia Yu
Hexin Bai
Yong-mei Xu
Chunyuan Liao
Haibin Ling
VOT
172
1,171
0
20 Sep 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
795
132,454
0
12 Jun 2017
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
369
27,253
0
20 Mar 2017
Learning Video Object Segmentation from Static Images
Anna Khoreva
Federico Perazzi
Rodrigo Benenson
Bernt Schiele
A. Sorkine-Hornung
VOS
79
588
0
08 Dec 2016
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
108
1,919
0
29 Jul 2016
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
300
4,511
0
20 Nov 2014
Previous
1
2