ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.05392
  4. Cited By
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

9 June 2021
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
ArXivPDFHTML

Papers citing "Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers"

50 / 189 papers shown
Title
Sign Language Production with Latent Motion Transformer
Sign Language Production with Latent Motion Transformer
Pan Xie
Taiying Peng
Yao Du
Qipeng Zhang
SLR
27
3
0
20 Dec 2023
ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition
  in the Operating Room
ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room
Idris Hamoud
Muhammad Abdullah Jamal
V. Srivastav
Didier Mutter
N. Padoy
Omid Mohareri
21
2
0
19 Dec 2023
Towards Establishing Dense Correspondence on Multiview Coronary
  Angiography: From Point-to-Point to Curve-to-Curve Query Matching
Towards Establishing Dense Correspondence on Multiview Coronary Angiography: From Point-to-Point to Curve-to-Curve Query Matching
Yifan Wu
Rohit Jena
M. A. Gülsün
Vivek Singh
Puneet Sharma
James C. Gee
22
0
0
18 Dec 2023
Generating Action-conditioned Prompts for Open-vocabulary Video Action
  Recognition
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition
Chengyou Jia
Minnan Luo
Xiaojun Chang
Zhuohang Dang
Mingfei Han
Mengmeng Wang
Guangwen Dai
Sizhe Dang
Jingdong Wang
VLM
31
4
0
04 Dec 2023
Just Add $π$! Pose Induced Video Transformers for Understanding
  Activities of Daily Living
Just Add πππ! Pose Induced Video Transformers for Understanding Activities of Daily Living
Dominick Reilly
Srijan Das
ViT
35
17
0
30 Nov 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
CAST: Cross-Attention in Space and Time for Video Action Recognition
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
35
12
0
30 Nov 2023
DEVIAS: Learning Disentangled Video Representations of Action and Scene
  for Holistic Video Understanding
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding
Kyungho Bae
Geo Ahn
Youngrae Kim
Jinwoo Choi
30
3
0
30 Nov 2023
A Simple Video Segmenter by Tracking Objects Along Axial Trajectories
A Simple Video Segmenter by Tracking Objects Along Axial Trajectories
Ju He
Qihang Yu
Inkyu Shin
XueQing Deng
Alan L. Yuille
Xiaohui Shen
Liang-Chieh Chen
VOS
40
2
0
30 Nov 2023
Object-based (yet Class-agnostic) Video Domain Adaptation
Object-based (yet Class-agnostic) Video Domain Adaptation
Dantong Niu
Amir Bar
Roei Herzig
Trevor Darrell
Anna Rohrbach
29
1
0
29 Nov 2023
REACT: Recognize Every Action Everywhere All At Once
REACT: Recognize Every Action Everywhere All At Once
N. V. R. Chappa
Pha Nguyen
P. Dobbs
Khoa Luu
36
6
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
33
6
0
27 Nov 2023
GLAD: Global-Local View Alignment and Background Debiasing for
  Unsupervised Video Domain Adaptation with Large Domain Gap
GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap
Hyogun Lee
Kyungho Bae
Seong Jong Ha
Yumin Ko
Gyeong-Moon Park
Jinwoo Choi
16
2
0
21 Nov 2023
Modality Mixer Exploiting Complementary Information for Multi-modal
  Action Recognition
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Muhammad Adi Nugroho
Changick Kim
30
0
0
21 Nov 2023
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao
Bingkun Huang
Sen Xing
Gangshan Wu
Yu Qiao
Limin Wang
42
5
0
06 Nov 2023
Object-centric Video Representation for Long-term Action Anticipation
Object-centric Video Representation for Long-term Action Anticipation
Ce Zhang
Changcheng Fu
Shijie Wang
Nakul Agarwal
Kwonjoon Lee
Chiho Choi
Chen Sun
30
14
0
31 Oct 2023
How Physics and Background Attributes Impact Video Transformers in
  Robotic Manipulation: A Case Study on Planar Pushing
How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing
Shutong Jin
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
32
1
0
03 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
Selective Volume Mixup for Video Action Recognition
Selective Volume Mixup for Video Action Recognition
Yi Tan
Zhaofan Qiu
Y. Hao
Ting Yao
Xiangnan He
Tao Mei
ViT
30
2
0
18 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
27
18
0
14 Sep 2023
Dataset Condensation via Generative Model
Dataset Condensation via Generative Model
David Junhao Zhang
Heng Wang
Chuhui Xue
Rui Yan
Wenqing Zhang
Song Bai
Mike Zheng Shou
DD
26
11
0
14 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction
  Understanding
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
25
9
0
05 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
34
20
0
27 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
Motion-Guided Masking for Spatiotemporal Representation Learning
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
37
19
0
24 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding
MOFO: MOtion FOcused Self-Supervision for Video Understanding
Mona Ahmadian
Frank Guerin
Andrew Gilbert
38
2
0
23 Aug 2023
Opening the Vocabulary of Egocentric Actions
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
45
16
0
22 Aug 2023
Video BagNet: short temporal receptive fields increase robustness in
  long-term action recognition
Video BagNet: short temporal receptive fields increase robustness in long-term action recognition
Ombretta Strafforello
X. Liu
Klamer Schutte
Jan van Gemert
32
2
0
22 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
39
30
0
21 Aug 2023
An Outlook into the Future of Egocentric Vision
An Outlook into the Future of Egocentric Vision
Chiara Plizzari
Gabriele Goletto
Antonino Furnari
Siddhant Bansal
Francesco Ragusa
G. Farinella
Dima Damen
Tatiana Tommasi
EgoV
40
38
0
14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
38
9
0
10 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
29
16
0
08 Aug 2023
Robotic Vision for Human-Robot Interaction and Collaboration: A Survey
  and Systematic Review
Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review
Nicole L. Robinson
Brendan Tidd
Dylan Campbell
Dana Kulić
Peter Corke
41
55
0
28 Jul 2023
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised
  Learning of Motion and Content Features
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features
Adrien Bardes
Jean Ponce
Yann LeCun
MDE
39
25
0
24 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
54
19
0
13 Jul 2023
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Yuqin Zhu
Yichen Zhu
ViT
72
17
0
05 Jul 2023
How can objects help action recognition?
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
42
14
0
20 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in
  Vision Transformers
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Aman Chadha
Srijan Das
ViT
33
4
0
15 Jun 2023
Detect Any Shadow: Segment Anything for Video Shadow Detection
Detect Any Shadow: Segment Anything for Video Shadow Detection
Yonghui Wang
Wen-gang Zhou
Yunyao Mao
Houqiang Li
VLM
21
22
0
26 May 2023
Enhancing Next Active Object-based Egocentric Action Anticipation with
  Guided Attention
Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention
Sanket Thakur
Cigdem Beyan
Pietro Morerio
Vittorio Murino
Alessio Del Bue
38
6
0
22 May 2023
Annotation-free Audio-Visual Segmentation
Annotation-free Audio-Visual Segmentation
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOS
VLM
39
28
0
18 May 2023
ReasonNet: End-to-End Driving with Temporal and Global Reasoning
ReasonNet: End-to-End Driving with Temporal and Global Reasoning
Hao Shao
Letian Wang
Ruobing Chen
Steven L. Waslander
Hongsheng Li
Y. Liu
LRM
41
71
0
17 May 2023
Modelling Spatio-Temporal Interactions for Compositional Action
  Recognition
Modelling Spatio-Temporal Interactions for Compositional Action Recognition
Ramanathan Rajendiran
Debaditya Roy
Basura Fernando
43
1
0
04 May 2023
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group
  Activity Recognition
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
36
8
0
27 Apr 2023
Implicit Temporal Modeling with Learnable Alignment for Video
  Recognition
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
30
35
0
20 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
39
74
0
06 Apr 2023
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
  Action Segmentation
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang
Haibin Ling
15
2
0
04 Apr 2023
Use Your Head: Improving Long-Tail Video Recognition
Use Your Head: Improving Long-Tail Video Recognition
Toby Perrett
Saptarshi Sinha
T. Burghardt
Majid Mirmehdi
Dima Damen
38
15
0
03 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
29
0
0
01 Apr 2023
Streaming Video Model
Streaming Video Model
Yucheng Zhao
Chong Luo
Chuanxin Tang
Dongdong Chen
Noel Codella
Zhengjun Zha
36
12
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
71
329
0
29 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
41
94
0
25 Mar 2023
Previous
1234
Next