ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.03982
  4. Cited By
SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

10 December 2018
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
ArXivPDFHTML

Papers citing "SlowFast Networks for Video Recognition"

50 / 655 papers shown
Title
Deformable Video Transformer
Deformable Video Transformer
Jue Wang
Lorenzo Torresani
ViT
30
28
0
31 Mar 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
35
94
0
30 Mar 2022
End-to-End Compressed Video Representation Learning for Generic Event
  Boundary Detection
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
Congcong Li
Xinyao Wang
Longyin Wen
Dexiang Hong
Tiejian Luo
Libo Zhang
28
16
0
29 Mar 2022
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal
  Action Localization
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Bo He
Xitong Yang
Le Kang
Zhiyu Cheng
Xingfa Zhou
Abhinav Shrivastava
33
77
0
29 Mar 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding
  Procedural Activities
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
33
205
0
28 Mar 2022
Class-Incremental Learning for Action Recognition in Videos
Class-Incremental Learning for Action Recognition in Videos
Jaeyoo Park
Minsoo Kang
Bohyung Han
CLL
24
52
0
25 Mar 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval
  and Highlight Detection
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Ye Liu
Siyuan Li
Yang Wu
C. Chen
Ying Shan
Xiaohu Qie
ViT
27
140
0
23 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
167
1,134
0
23 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
40
19
0
23 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions
  from Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
21
32
0
22 Mar 2022
Point3D: tracking actions as moving points with 3D CNNs
Point3D: tracking actions as moving points with 3D CNNs
Shentong Mo
Jingfei Xia
Xiaoqing Ellen Tan
Bhiksha Raj
3DPC
20
5
0
20 Mar 2022
DirecFormer: A Directed Attention in Transformer Approach to Robust
  Action Recognition
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
Thanh-Dat Truong
Quoc-Huy Bui
C. Duong
Han-Seok Seo
Son Lam Phung
Xin Li
Khoa Luu
ViT
42
49
0
19 Mar 2022
Gate-Shift-Fuse for Video Action Recognition
Gate-Shift-Fuse for Video Action Recognition
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
30
22
0
16 Mar 2022
RCL: Recurrent Continuous Localization for Temporal Action Detection
RCL: Recurrent Continuous Localization for Temporal Action Detection
Qiang Wang
Yanhao Zhang
Yun Zheng
Pan Pan
ObjD
32
38
0
14 Mar 2022
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
22
1
0
13 Mar 2022
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal
  Reasoning
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
Shiwen Zhang
AI4TS
29
9
0
11 Mar 2022
GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive
  Recognition of Cereal Grains
GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains
Lei Fan
Yiwen Ding
Dongdong Fan
Donglin Di
Maurice Pagnucco
Yang Song
AI4TS
37
19
0
10 Mar 2022
OpenTAL: Towards Open Set Temporal Action Localization
OpenTAL: Towards Open Set Temporal Action Localization
Wentao Bao
Qi Yu
Yu Kong
EDL
37
26
0
10 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition
  on Modality-Specific Annotated Videos
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
Saghir Alfasly
Jian Lu
C. Xu
Yuru Zou
42
18
0
06 Mar 2022
Colar: Effective and Efficient Online Action Detection by Consulting
  Exemplars
Colar: Effective and Efficient Online Action Detection by Consulting Exemplars
Le Yang
Junwei Han
Dingwen Zhang
27
35
0
02 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary
  Detection
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
55
14
0
01 Mar 2022
Motion-driven Visual Tempo Learning for Video-based Action Recognition
Motion-driven Visual Tempo Learning for Video-based Action Recognition
Yuanzhong Liu
Junsong Yuan
Zhigang Tu
27
58
0
24 Feb 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
82
213
0
18 Feb 2022
ActionFormer: Localizing Moments of Actions with Transformers
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
31
333
0
16 Feb 2022
HAKE: A Knowledge Engine Foundation for Human Activity Understanding
HAKE: A Knowledge Engine Foundation for Human Activity Understanding
Yong-Lu Li
Xinpeng Liu
Xiaoqian Wu
Yizhuo Li
Zuoyu Qiu
Liang Xu
Yue Xu
Haoshu Fang
Cewu Lu
32
38
0
14 Feb 2022
Should I take a walk? Estimating Energy Expenditure from Video Data
Should I take a walk? Estimating Energy Expenditure from Video Data
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
18
4
0
01 Feb 2022
Learning To Recognize Procedural Activities with Distant Supervision
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
35
83
0
26 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
162
360
0
24 Jan 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
  Long-Term Video Recognition
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
48
198
0
20 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
Laurens van der Maaten
Armand Joulin
Ishan Misra
229
226
0
20 Jan 2022
Action Keypoint Network for Efficient Video Recognition
Action Keypoint Network for Efficient Video Recognition
Xu Chen
Yahong Han
Xiaohan Wang
Yifang Sun
Yi Yang
3DPC
29
6
0
17 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
24
103
0
16 Jan 2022
Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition
  in Smart Video Surveillance
Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance
Justin Sanchez
Christopher Neff
Hamed Tabkhi
GNN
30
9
0
15 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
52
238
0
12 Jan 2022
OCSampler: Compressing Videos to One Clip with Single-step Sampling
OCSampler: Compressing Videos to One Clip with Single-step Sampling
Jintao Lin
Haodong Duan
Kai-xiang Chen
Dahua Lin
Limin Wang
44
24
0
12 Jan 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
26
212
0
12 Jan 2022
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for
  Cross-Domain Robustness in Action Recognition
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
Sofia Broomé
Ernest Pokropek
Boyu Li
Hedvig Kjellström
21
7
0
22 Dec 2021
Precondition and Effect Reasoning for Action Recognition
Precondition and Effect Reasoning for Action Recognition
Hongsang Yoo
Haopeng Li
Qiuhong Ke
Liangchen Liu
Rui Zhang
CML
49
4
0
19 Dec 2021
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition
Yinghao Xu
Fangyun Wei
Xiao Sun
Ceyuan Yang
Yujun Shen
Bo Dai
Bolei Zhou
Stephen Lin
VLM
33
52
0
17 Dec 2021
Distillation of Human-Object Interaction Contexts for Action Recognition
Distillation of Human-Object Interaction Contexts for Action Recognition
Muna Almushyti
Frederick W. Li
34
3
0
17 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
100
655
0
16 Dec 2021
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based
  Motion Recognition
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition
Benjia Zhou
Pichao Wang
Jun Wan
Yanyan Liang
Fan Wang
Du Zhang
Zhen Lei
Hao Li
Rong Jin
36
29
0
16 Dec 2021
Temporal Shuffling for Defending Deep Action Recognition Models against
  Adversarial Attacks
Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks
Jaehui Hwang
Huan Zhang
Jun-Ho Choi
Cho-Jui Hsieh
Jong-Seok Lee
AAML
19
5
0
15 Dec 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive
  Cross-modal Matching and Denoising
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Hongyang Chao
Tao Mei
VLM
18
41
0
14 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
34
17
0
13 Dec 2021
Video as Conditional Graph Hierarchy for Multi-Granular Question
  Answering
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
35
111
0
12 Dec 2021
Discrete neural representations for explainable anomaly detection
Discrete neural representations for explainable anomaly detection
Stanislaw Szymanowicz
James Charles
R. Cipolla
AAML
AI4TS
FAtt
27
20
0
10 Dec 2021
Cross-Modal Transferable Adversarial Attacks from Images to Videos
Cross-Modal Transferable Adversarial Attacks from Images to Videos
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu-Gang Jiang
AAML
30
38
0
10 Dec 2021
Contextualized Spatio-Temporal Contrastive Learning with
  Self-Supervision
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision
Liangzhe Yuan
Rui Qian
Huayu Chen
Boqing Gong
Florian Schroff
Ming-Hsuan Yang
Hartwig Adam
Ting Liu
AI4TS
30
15
0
09 Dec 2021
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural
  Architecture Search
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search
Yi Ding
Xinyu Gong
Junru Wu
Humphrey Shi
Zhicheng Yan
Zhangyang Wang
VGen
52
1
0
09 Dec 2021
Previous
123...789...121314
Next