Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.05392
Cited By
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
9 June 2021
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers"
50 / 189 papers shown
Title
Sign Language Production with Latent Motion Transformer
Pan Xie
Taiying Peng
Yao Du
Qipeng Zhang
SLR
27
3
0
20 Dec 2023
ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room
Idris Hamoud
Muhammad Abdullah Jamal
V. Srivastav
Didier Mutter
N. Padoy
Omid Mohareri
21
2
0
19 Dec 2023
Towards Establishing Dense Correspondence on Multiview Coronary Angiography: From Point-to-Point to Curve-to-Curve Query Matching
Yifan Wu
Rohit Jena
M. A. Gülsün
Vivek Singh
Puneet Sharma
James C. Gee
22
0
0
18 Dec 2023
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition
Chengyou Jia
Minnan Luo
Xiaojun Chang
Zhuohang Dang
Mingfei Han
Mengmeng Wang
Guangwen Dai
Sizhe Dang
Jingdong Wang
VLM
31
4
0
04 Dec 2023
Just Add
π
π
π
! Pose Induced Video Transformers for Understanding Activities of Daily Living
Dominick Reilly
Srijan Das
ViT
35
17
0
30 Nov 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
35
12
0
30 Nov 2023
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding
Kyungho Bae
Geo Ahn
Youngrae Kim
Jinwoo Choi
30
3
0
30 Nov 2023
A Simple Video Segmenter by Tracking Objects Along Axial Trajectories
Ju He
Qihang Yu
Inkyu Shin
XueQing Deng
Alan L. Yuille
Xiaohui Shen
Liang-Chieh Chen
VOS
40
2
0
30 Nov 2023
Object-based (yet Class-agnostic) Video Domain Adaptation
Dantong Niu
Amir Bar
Roei Herzig
Trevor Darrell
Anna Rohrbach
29
1
0
29 Nov 2023
REACT: Recognize Every Action Everywhere All At Once
N. V. R. Chappa
Pha Nguyen
P. Dobbs
Khoa Luu
36
6
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
33
6
0
27 Nov 2023
GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap
Hyogun Lee
Kyungho Bae
Seong Jong Ha
Yumin Ko
Gyeong-Moon Park
Jinwoo Choi
16
2
0
21 Nov 2023
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Muhammad Adi Nugroho
Changick Kim
30
0
0
21 Nov 2023
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao
Bingkun Huang
Sen Xing
Gangshan Wu
Yu Qiao
Limin Wang
42
5
0
06 Nov 2023
Object-centric Video Representation for Long-term Action Anticipation
Ce Zhang
Changcheng Fu
Shijie Wang
Nakul Agarwal
Kwonjoon Lee
Chiho Choi
Chen Sun
30
14
0
31 Oct 2023
How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing
Shutong Jin
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
32
1
0
03 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
Selective Volume Mixup for Video Action Recognition
Yi Tan
Zhaofan Qiu
Y. Hao
Ting Yao
Xiangnan He
Tao Mei
ViT
30
2
0
18 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
27
18
0
14 Sep 2023
Dataset Condensation via Generative Model
David Junhao Zhang
Heng Wang
Chuhui Xue
Rui Yan
Wenqing Zhang
Song Bai
Mike Zheng Shou
DD
26
11
0
14 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
25
9
0
05 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
34
20
0
27 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
37
19
0
24 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding
Mona Ahmadian
Frank Guerin
Andrew Gilbert
38
2
0
23 Aug 2023
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
45
16
0
22 Aug 2023
Video BagNet: short temporal receptive fields increase robustness in long-term action recognition
Ombretta Strafforello
X. Liu
Klamer Schutte
Jan van Gemert
32
2
0
22 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
39
30
0
21 Aug 2023
An Outlook into the Future of Egocentric Vision
Chiara Plizzari
Gabriele Goletto
Antonino Furnari
Siddhant Bansal
Francesco Ragusa
G. Farinella
Dima Damen
Tatiana Tommasi
EgoV
40
38
0
14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
38
9
0
10 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
29
16
0
08 Aug 2023
Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review
Nicole L. Robinson
Brendan Tidd
Dylan Campbell
Dana Kulić
Peter Corke
41
55
0
28 Jul 2023
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features
Adrien Bardes
Jean Ponce
Yann LeCun
MDE
39
25
0
24 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
54
19
0
13 Jul 2023
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Yuqin Zhu
Yichen Zhu
ViT
72
17
0
05 Jul 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
42
14
0
20 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Aman Chadha
Srijan Das
ViT
33
4
0
15 Jun 2023
Detect Any Shadow: Segment Anything for Video Shadow Detection
Yonghui Wang
Wen-gang Zhou
Yunyao Mao
Houqiang Li
VLM
21
22
0
26 May 2023
Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention
Sanket Thakur
Cigdem Beyan
Pietro Morerio
Vittorio Murino
Alessio Del Bue
38
6
0
22 May 2023
Annotation-free Audio-Visual Segmentation
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOS
VLM
39
28
0
18 May 2023
ReasonNet: End-to-End Driving with Temporal and Global Reasoning
Hao Shao
Letian Wang
Ruobing Chen
Steven L. Waslander
Hongsheng Li
Y. Liu
LRM
41
71
0
17 May 2023
Modelling Spatio-Temporal Interactions for Compositional Action Recognition
Ramanathan Rajendiran
Debaditya Roy
Basura Fernando
43
1
0
04 May 2023
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
36
8
0
27 Apr 2023
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
30
35
0
20 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
39
74
0
06 Apr 2023
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang
Haibin Ling
15
2
0
04 Apr 2023
Use Your Head: Improving Long-Tail Video Recognition
Toby Perrett
Saptarshi Sinha
T. Burghardt
Majid Mirmehdi
Dima Damen
38
15
0
03 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
29
0
0
01 Apr 2023
Streaming Video Model
Yucheng Zhao
Chong Luo
Chuanxin Tang
Dongdong Chen
Noel Codella
Zhengjun Zha
36
12
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
71
329
0
29 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
41
94
0
25 Mar 2023
Previous
1
2
3
4
Next