ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.05038
  4. Cited By
Long-Term Feature Banks for Detailed Video Understanding

Long-Term Feature Banks for Detailed Video Understanding

12 December 2018
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
ArXivPDFHTML

Papers citing "Long-Term Feature Banks for Detailed Video Understanding"

50 / 306 papers shown
Title
Distillation of Human-Object Interaction Contexts for Action Recognition
Distillation of Human-Object Interaction Contexts for Action Recognition
Muna Almushyti
Frederick W. Li
34
3
0
17 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
29
17
0
13 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
75
678
0
02 Dec 2021
Stacked Temporal Attention: Improving First-person Action Recognition by
  Emphasizing Discriminative Clips
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips
Lijin Yang
Yifei Huang
Yusuke Sugano
Yoichi Sato
28
5
0
02 Dec 2021
Exploring Segment-level Semantics for Online Phase Recognition from
  Surgical Videos
Exploring Segment-level Semantics for Online Phase Recognition from Surgical Videos
Xinpeng Ding
Xiaomeng Li
22
33
0
22 Nov 2021
Revisiting spatio-temporal layouts for compositional action recognition
Revisiting spatio-temporal layouts for compositional action recognition
Gorjan Radevski
Marie-Francine Moens
Tinne Tuytelaars
30
26
0
02 Nov 2021
With a Little Help from my Temporal Context: Multimodal Egocentric
  Action Recognition
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Evangelos Kazakos
Jaesung Huh
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
50
45
0
01 Nov 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
21
24
0
27 Oct 2021
Leveraging Local Temporal Information for Multimodal Scene
  Classification
Leveraging Local Temporal Information for Multimodal Scene Classification
Saurabh Sahu
Palash Goyal
ViT
19
0
0
26 Oct 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in
  First Person Action Recognition
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
22
33
0
19 Oct 2021
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context
Yuxi Li
Boshen Zhang
Jian Li
Yabiao Wang
Weiyao Lin
Chengjie Wang
Jilin Li
Feiyue Huang
37
5
0
19 Oct 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
30
82
0
13 Oct 2021
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani
Yingli Tian
49
60
0
30 Sep 2021
Efficient Global-Local Memory for Real-time Instrument Segmentation of
  Robotic Surgical Video
Efficient Global-Local Memory for Real-time Instrument Segmentation of Robotic Surgical Video
Jiacheng Wang
Yueming Jin
Liansheng Wang
Shuntian Cai
Pheng-Ann Heng
Jing Qin
51
17
0
28 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
152
362
0
17 Sep 2021
Is First Person Vision Challenging for Object Tracking?
Is First Person Vision Challenging for Object Tracking?
Matteo Dunnhofer
Antonino Furnari
G. Farinella
C. Micheloni
27
23
0
31 Aug 2021
Mining Contextual Information Beyond Image for Semantic Segmentation
Mining Contextual Information Beyond Image for Semantic Segmentation
Zhenchao Jin
Tao Gong
Dongdong Yu
Qi Chu
Jian Wang
Changhu Wang
Jie Shao
27
88
0
26 Aug 2021
Identity-aware Graph Memory Network for Action Detection
Identity-aware Graph Memory Network for Action Detection
Jingcheng Ni
Jie Qin
Di Huang
28
9
0
26 Aug 2021
Temporal Action Segmentation with High-level Complex Activity Labels
Temporal Action Segmentation with High-level Complex Activity Labels
Guodong Ding
Angela Yao
33
18
0
15 Aug 2021
Focus on the Positives: Self-Supervised Learning for Biodiversity
  Monitoring
Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring
Omiros Pantazis
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
SSL
34
30
0
14 Aug 2021
Video Contrastive Learning with Global Context
Video Contrastive Learning with Global Context
Haofei Kuang
Yi Zhu
Zhi-Li Zhang
Xinyu Li
Joseph Tighe
Sören Schwertfeger
C. Stachniss
Mu Li
SSL
AI4TS
21
60
0
05 Aug 2021
Predicting the Future from First Person (Egocentric) Vision: A Survey
Predicting the Future from First Person (Egocentric) Vision: A Survey
Ivan Rodin
Antonino Furnari
Dimitrios Mavroeidis
G. Farinella
EgoV
21
42
0
28 Jul 2021
Transferable Knowledge-Based Multi-Granularity Aggregation Network for
  Temporal Action Localization: Submission to ActivityNet Challenge 2021
Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021
Haisheng Su
Peiqin Zhuang
Yukun Li
Dongliang Wang
Weihao Gan
Wei Wu
Yu Qiao
38
1
0
27 Jul 2021
Human-like Relational Models for Activity Recognition in Video
Human-like Relational Models for Activity Recognition in Video
J. Chrol-Cannon
Andrew Gilbert
R. Lazic
Adithya Madhusoodanan
Frank Guerin
BDL
25
1
0
12 Jul 2021
Review of Video Predictive Understanding: Early Action Recognition and
  Future Action Prediction
Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction
He Zhao
Richard P. Wildes
22
8
0
11 Jul 2021
Long Short-Term Transformer for Online Action Detection
Long Short-Term Transformer for Online Action Detection
Mingze Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Xia
Z. Tu
Stefano Soatto
ViT
40
130
0
07 Jul 2021
Spatio-Temporal Context for Action Detection
Spatio-Temporal Context for Action Detection
Manuel Sarmiento Calderó
David Varas
Elisenda Bou
27
2
0
29 Jun 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
49
166
0
21 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
37
127
0
21 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream
  Prototypical Contrasting
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
40
11
0
18 Jun 2021
JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group
  and Activity Detection
JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection
Mahsa Ehsanpour
F. Saleh
Silvio Savarese
Ian Reid
Hamid Rezatofighi
30
42
0
16 Jun 2021
Relation Modeling in Spatio-Temporal Action Localization
Relation Modeling in Spatio-Temporal Action Localization
Yutong Feng
Jianwen Jiang
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Shiwei Zhang
Mingqian Tang
Yue Gao
33
11
0
15 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
18
274
0
09 Jun 2021
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in
  Time
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
Shao-Wei Liu
Hanwen Jiang
Jiarui Xu
Sifei Liu
Xiaolong Wang
3DH
35
161
0
09 Jun 2021
Towards Training Stronger Video Vision Transformers for
  EPIC-KITCHENS-100 Action Recognition
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Zhurong Xia
Mingqian Tang
Nong Sang
M. Ang
ViT
24
11
0
09 Jun 2021
Anticipative Video Transformer
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
27
207
0
03 Jun 2021
Cross-Domain First Person Audio-Visual Action Recognition through
  Relative Norm Alignment
Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
14
12
0
03 Jun 2021
ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction
  Detection in Videos
ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos
Meng-Jiun Chiou
Chun-Yu Liao
Li-Wei Wang
Roger Zimmermann
Jiashi Feng
41
24
0
25 May 2021
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized
  Sports Actions
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Yixuan Li
Lei Chen
Runyu He
Zhenzhi Wang
Gangshan Wu
Limin Wang
27
97
0
16 May 2021
Not All Memories are Created Equal: Learning to Forget by Expiring
Not All Memories are Created Equal: Learning to Forget by Expiring
Sainbayar Sukhbaatar
Da Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
CLL
13
34
0
13 May 2021
Few-Shot Video Object Detection
Few-Shot Video Object Detection
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
34
11
0
30 Apr 2021
VidTr: Video Transformer Without Convolutions
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
139
193
0
23 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
H2O: Two Hands Manipulating Objects for First Person Interaction
  Recognition
H2O: Two Hands Manipulating Objects for First Person Interaction Recognition
Taein Kwon
Bugra Tekin
Jan Stühmer
Federica Bogo
Marc Pollefeys
EgoV
29
168
0
22 Apr 2021
Temporal Query Networks for Fine-grained Video Understanding
Temporal Query Networks for Fine-grained Video Understanding
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
24
83
0
19 Apr 2021
Spatiotemporal Deformable Scene Graphs for Complex Activity Detection
Spatiotemporal Deformable Scene Graphs for Complex Activity Detection
Salman Khan
Fabio Cuzzolin
3DPC
51
5
0
16 Apr 2021
Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
  Memories
Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories
Xitong Yang
Haoqi Fan
Lorenzo Torresani
L. Davis
Heng Wang
VLM
24
20
0
02 Apr 2021
Visual Semantic Role Labeling for Video Understanding
Visual Semantic Role Labeling for Video Understanding
Arka Sadhu
Tanmay Gupta
Mark Yatskar
Ram Nevatia
Aniruddha Kembhavi
VLM
20
68
0
02 Apr 2021
TubeR: Tubelet Transformer for Video Action Detection
TubeR: Tubelet Transformer for Video Action Detection
Jiaojiao Zhao
Yanyi Zhang
Xinyu Li
Hao Chen
Shuai Bing
...
Yuanjun Xiong
Davide Modolo
I. Marsic
Cees G. M. Snoek
Joseph Tighe
ViT
36
70
0
02 Apr 2021
Motion Guided Attention Fusion to Recognize Interactions from Videos
Motion Guided Attention Fusion to Recognize Interactions from Videos
Tae Soo Kim
Jonathan D. Jones
Gregory Hager
22
15
0
01 Apr 2021
Previous
1234567
Next