ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.06567
  4. Cited By
A Comprehensive Study of Deep Video Action Recognition

A Comprehensive Study of Deep Video Action Recognition

11 December 2020
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
    VLM
    AI4TS
ArXivPDFHTML

Papers citing "A Comprehensive Study of Deep Video Action Recognition"

50 / 85 papers shown
Title
Impossible Videos
Impossible Videos
Zechen Bai
Hai Ci
Mike Zheng Shou
EGVM
VGen
72
0
0
18 Mar 2025
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
Otto Brookes
Maksim Kukushkin
Majid Mirmehdi
Colleen Stephens
Paula Dieguez
...
Lukas Boesch
Thomas Schmid
M. Arandjelovic
H. Kühl
T. Burghardt
48
0
0
28 Feb 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Can masking background and object reduce static bias for zero-shot action recognition?
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
43
0
0
22 Jan 2025
Dynamic Scene Understanding from Vision-Language Representations
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
164
0
0
20 Jan 2025
SEMU-Net: A Segmentation-based Corrector for Fabrication Process
  Variations of Nanophotonics with Microscopic Images
SEMU-Net: A Segmentation-based Corrector for Fabrication Process Variations of Nanophotonics with Microscopic Images
Rambod Azimi
Yijian Kong
D. Gostimirovic
James J. Clark
O. Liboiron-Ladouceur
62
0
0
25 Nov 2024
STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting
  Transformer-based Video Models
STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models
Zerui Wang
Yan Liu
50
0
0
01 Nov 2024
Low-Latency Video Anonymization for Crowd Anomaly Detection: Privacy vs.
  Performance
Low-Latency Video Anonymization for Crowd Anomaly Detection: Privacy vs. Performance
M. Asres
Lei Jiao
C. Omlin
31
0
0
24 Oct 2024
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Hao Wu
Donglin Bai
Shiqi Jiang
Qianxi Zhang
Y. Yang
Ting Cao
Fengyuan Xu
Yunxin Liu
Fengyuan Xu
142
0
0
19 Oct 2024
MotionBank: A Large-scale Video Motion Benchmark with Disentangled
  Rule-based Annotations
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations
Liang Xu
Shaoyang Hua
Zili Lin
Yifan Liu
Feipeng Ma
Yichao Yan
Xin Jin
Xiaokang Yang
Wenjun Zeng
VGen
39
3
0
17 Oct 2024
One missing piece in Vision and Language: A Survey on Comics Understanding
One missing piece in Vision and Language: A Survey on Comics Understanding
Emanuele Vivoli
Andrey Barsky
Mohamed Ali Souibgui
Artemis LLabres
Marco Bertini
Dimosthenis Karatzas
34
3
0
14 Sep 2024
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition
Ahmed Abdelkawy
Asem A. Ali
Aly A. Farag
3DPC
26
0
0
10 Aug 2024
SignCLIP: Connecting Text and Sign Language by Contrastive Learning
SignCLIP: Connecting Text and Sign Language by Contrastive Learning
Zifan Jiang
Gerard Sant
Amit Moryossef
Mathias Müller
Rico Sennrich
Sarah Ebling
VLM
CLIP
34
2
0
01 Jul 2024
SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible
  Neural Networks
SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks
Yi Pan
Jun-Jie Huang
Zihan Chen
Wentao Zhao
Ziyue Wang
28
0
0
04 Jun 2024
ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised
  Action Recognition in Videos
ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos
Sharana Dharshikgan Suresh Dass
H. Barua
Ganesh Krishnasamy
Raveendran Paramesran
Raphael C.-W. Phan
ViT
23
2
0
09 Apr 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained
  Models for Spatiotemporal Modeling
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
W. G. C. Bandara
Vishal M. Patel
VPVLM
VLM
28
1
0
11 Mar 2024
A Survey on Generative AI and LLM for Video Generation, Understanding,
  and Streaming
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
Pengyuan Zhou
Lin Wang
Zhi Liu
Yanbin Hao
Pan Hui
Sasu Tarkoma
J. Kangasharju
VGen
38
26
0
30 Jan 2024
Multi-model learning by sequential reading of untrimmed videos for
  action recognition
Multi-model learning by sequential reading of untrimmed videos for action recognition
Kodai Kamiya
Toru Tamaki
23
0
0
26 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
50
82
0
29 Dec 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action
  Recognition
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
21
1
0
28 Nov 2023
Student Classroom Behavior Detection based on Spatio-Temporal Network
  and Multi-Model Fusion
Student Classroom Behavior Detection based on Spatio-Temporal Network and Multi-Model Fusion
Fan Yang
Xiaofei Wang
24
1
0
25 Oct 2023
Proving the Potential of Skeleton Based Action Recognition to Automate
  the Analysis of Manual Processes
Proving the Potential of Skeleton Based Action Recognition to Automate the Analysis of Manual Processes
Marlin Berger
F. Cloppenburg
Jens Eufinger
Thomas Gries
18
0
0
12 Oct 2023
Automatic nodule identification and differentiation in ultrasound videos
  to facilitate per-nodule examination
Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination
Siyuan Jiang
Yan Ding
Yuling Wang
Lei Xu
Wenli Dai
...
Jie Yu
Jianqiao Zhou
Chunquan Zhang
Ping Liang
Dexing Kong
11
0
0
10 Oct 2023
SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior
SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior
Fan Yang
Tao Wang
18
17
0
04 Oct 2023
Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video
  Captioning
Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning
Sikiru Adewale
Tosin Ige
Bolanle Hafiz Matti
VLM
9
4
0
02 Oct 2023
TransNet: A Transfer Learning-Based Network for Human Action Recognition
TransNet: A Transfer Learning-Based Network for Human Action Recognition
Khaled Alomar
Xiaohao Cai
27
1
0
13 Sep 2023
Knowledge-Guided Short-Context Action Anticipation in Human-Centric
  Videos
Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos
Sarthak Bhagat
Simon Stepputtis
Joseph Campbell
Katia P. Sycara
31
4
0
12 Sep 2023
Joint learning of images and videos with a single Vision Transformer
Joint learning of images and videos with a single Vision Transformer
Shuki Shimizu
Toru Tamaki
ViT
11
0
0
21 Aug 2023
The Unreasonable Effectiveness of Large Language-Vision Models for
  Source-free Video Domain Adaptation
The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation
Giacomo Zara
Alessandro Conti
Subhankar Roy
Stéphane Lathuilière
Paolo Rota
Elisa Ricci
25
11
0
17 Aug 2023
E2E-LOAD: End-to-End Long-form Online Action Detection
E2E-LOAD: End-to-End Long-form Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
25
5
0
13 Jun 2023
Student Classroom Behavior Detection based on Improved YOLOv7
Student Classroom Behavior Detection based on Improved YOLOv7
Fan Yang
11
6
0
06 Jun 2023
Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model
  Fusion
Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion
Fan Yang
Tao Wang
Xiaofei Wang
6
13
0
13 May 2023
Learning Human-Human Interactions in Images from Weak Textual
  Supervision
Learning Human-Human Interactions in Images from Weak Textual Supervision
Morris Alper
Hadar Averbuch-Elor
VLM
37
2
0
27 Apr 2023
Video-based Contrastive Learning on Decision Trees: from Action
  Recognition to Autism Diagnosis
Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis
Mindi Ruan
Xiang Yu
Naifeng Zhang
Chuanbo Hu
Shuo Wang
Xin Li
28
8
0
20 Apr 2023
SCB-dataset: A Dataset for Detecting Student Classroom Behavior
SCB-dataset: A Dataset for Detecting Student Classroom Behavior
Yang Fan
9
10
0
05 Apr 2023
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
Giacomo Zara
Subhankar Roy
Paolo Rota
Elisa Ricci
VLM
19
12
0
03 Apr 2023
Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos
Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos
D. Fan
De-Yun Yang
Xinyu Li
Vimal Bhat
M. Rohith
SSL
15
1
0
13 Mar 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
ViT
44
144
0
06 Feb 2023
Fine-Grained Action Detection with RGB and Pose Information using Two
  Stream Convolutional Networks
Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional Networks
Leonard Hacker
Finn Bartels
Pierre-Etienne Martin
16
6
0
06 Feb 2023
Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using
  a New Frame Selection Policy and Gating Mechanism
Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using a New Frame Selection Policy and Gating Mechanism
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
13
4
0
18 Jan 2023
CNN-Based Action Recognition and Pose Estimation for Classifying Animal
  Behavior from Videos: A Survey
CNN-Based Action Recognition and Pose Estimation for Classifying Animal Behavior from Videos: A Survey
Michael Perez
Corey Toler-Franklin
MedIm
28
14
0
15 Jan 2023
Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus
  on Videos
Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos
Xingxing Wei
Songping Wang
Huanqian Yan
AAML
21
15
0
03 Jan 2023
Transformers in Action Recognition: A Review on Temporal Modeling
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
21
8
0
29 Dec 2022
Deep set conditioned latent representations for action recognition
Deep set conditioned latent representations for action recognition
Akash Singh
Tom De Schepper
Kevin Mets
P. Hellinckx
José Oramas
Steven Latré
BDL
6
2
0
21 Dec 2022
Egocentric Video Task Translation
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
21
13
0
13 Dec 2022
Quantifying and Learning Static vs. Dynamic Information in Deep
  Spatiotemporal Networks
Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks
M. Kowal
Mennatullah Siam
Md. Amirul Islam
Neil D. B. Bruce
Richard P. Wildes
Konstantinos G. Derpanis
FAtt
17
3
0
03 Nov 2022
End-to-end Transformer for Compressed Video Quality Enhancement
End-to-end Transformer for Compressed Video Quality Enhancement
Li Yu
Wenshuai Chang
Shiyu Wu
M. Gabbouj
ViT
19
8
0
25 Oct 2022
Collaborative Reasoning on Multi-Modal Semantic Graphs for
  Video-Grounded Dialogue Generation
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
Xueliang Zhao
Yuxuan Wang
Chongyang Tao
Chenshuo Wang
Dongyan Zhao
41
6
0
22 Oct 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey
Physical Adversarial Attack meets Computer Vision: A Decade Survey
Hui Wei
Hao Tang
Xuemei Jia
Zhixiang Wang
Han-Bing Yu
Zhubo Li
Shiníchi Satoh
Luc Van Gool
Zheng Wang
AAML
27
43
0
30 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
30
148
0
15 Sep 2022
Active Learning with Effective Scoring Functions for Semi-Supervised Temporal Action Localization
Ding Li
Xuebing Yang
Yongqiang Tang
Chenyang Zhang
Wensheng Zhang
29
4
0
31 Aug 2022
12
Next