ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.02707
  4. Cited By
Video Action Transformer Network

Video Action Transformer Network

6 December 2018
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
    ViT
ArXivPDFHTML

Papers citing "Video Action Transformer Network"

43 / 43 papers shown
Title
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
94
0
0
18 Dec 2024
A Comprehensive Review of Few-shot Action Recognition
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
114
3
0
20 Jul 2024
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
Chanyeon Kim
Jongwoon Park
Hyun-sool Bae
Woo Chang Kim
51
3
0
03 Apr 2024
Exploring Self-Attention for Crop-type Classification Explainability
Exploring Self-Attention for Crop-type Classification Explainability
Ivica Obadic
R. Roscher
Dario Augusto Borges Oliveira
Xiao Xiang Zhu
54
7
0
24 Oct 2022
Spatio-Temporal Action Graph Networks
Spatio-Temporal Action Graph Networks
Roei Herzig
Elad Levi
Huijuan Xu
Hang Gao
Eli Brosh
Xiaolong Wang
Amir Globerson
Trevor Darrell
GNN
36
20
0
04 Dec 2018
Actor-Centric Relation Network
Actor-Centric Relation Network
Chen Sun
Abhinav Shrivastava
Carl Vondrick
Kevin Patrick Murphy
Rahul Sukthankar
Cordelia Schmid
65
220
0
28 Jul 2018
YH Technologies at ActivityNet Challenge 2018
YH Technologies at ActivityNet Challenge 2018
Ting Yao
Xue Li
31
11
0
29 Jun 2018
Object Level Visual Reasoning in Videos
Object Level Visual Reasoning in Videos
Fabien Baradel
Natalia Neverova
Christian Wolf
J. Mille
Greg Mori
59
163
0
16 Jun 2018
Videos as Space-Time Region Graphs
Videos as Space-Time Region Graphs
Xinyu Wang
Abhinav Gupta
52
753
0
05 Jun 2018
VideoCapsuleNet: A Simplified Network for Action Detection
VideoCapsuleNet: A Simplified Network for Action Detection
Kevin Duarte
Yogesh S Rawat
M. Shah
MedIm
40
166
0
21 May 2018
Image Transformer
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
74
1,671
0
15 Feb 2018
Detect-and-Track: Efficient Pose Estimation in Videos
Detect-and-Track: Efficient Pose Estimation in Videos
Rohit Girdhar
Georgia Gkioxari
Lorenzo Torresani
Manohar Paluri
Du Tran
3DH
88
229
0
26 Dec 2017
Human Action Recognition: Pose-based Attention draws focus to Hands
Human Action Recognition: Pose-based Attention draws focus to Hands
Fabien Baradel
Christian Wolf
J. Mille
72
107
0
20 Dec 2017
A Closer Look at Spatiotemporal Convolutions for Action Recognition
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
172
3,007
0
30 Nov 2017
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Zhaofan Qiu
Ting Yao
Tao Mei
52
1,655
0
28 Nov 2017
Attention Clusters: Purely Attention Based Local Feature Integration for
  Video Classification
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
Xiang Long
Chuang Gan
Gerard de Melo
Jiajun Wu
Xiao-Chang Liu
Shilei Wen
45
209
0
27 Nov 2017
Non-local Neural Networks
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
170
8,867
0
21 Nov 2017
Attentional Pooling for Action Recognition
Attentional Pooling for Action Recognition
Rohit Girdhar
Deva Ramanan
65
319
0
04 Nov 2017
Learnable pooling with Context Gating for video classification
Learnable pooling with Context Gating for video classification
Antoine Miech
Ivan Laptev
Josef Sivic
38
327
0
21 Jun 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
278
129,831
0
12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
50
3,666
0
08 Jun 2017
A simple neural network module for relational reasoning
A simple neural network module for relational reasoning
Adam Santoro
David Raposo
David Barrett
Mateusz Malinowski
Razvan Pascanu
Peter W. Battaglia
Timothy Lillicrap
GNN
NAI
90
1,610
0
05 Jun 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual
  Actions
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
...
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
VGen
80
1,021
0
23 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
178
7,961
0
22 May 2017
The Kinetics Human Action Video Dataset
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
182
3,771
0
19 May 2017
Action Tubelet Detector for Spatio-Temporal Action Localization
Action Tubelet Detector for Spatio-Temporal Action Localization
Vicky Kalogeiton
Philippe Weinzaepfel
V. Ferrari
Cordelia Schmid
50
324
0
04 May 2017
ActionVLAD: Learning spatio-temporal aggregation for action
  classification
ActionVLAD: Learning spatio-temporal aggregation for action classification
Rohit Girdhar
Deva Ramanan
Abhinav Gupta
Josef Sivic
Bryan C. Russell
AI4TS
47
451
0
10 Apr 2017
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
Rui Hou
Chong Chen
M. Shah
MedIm
48
333
0
30 Mar 2017
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
Huijuan Xu
Abir Das
Kate Saenko
3DPC
100
714
0
22 Mar 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
227
27,018
0
20 Mar 2017
Asynchronous Temporal Fields for Action Recognition
Asynchronous Temporal Fields for Action Recognition
Gunnar Sigurdsson
S. Divvala
Ali Farhadi
Abhinav Gupta
BDL
47
170
0
19 Dec 2016
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
384
21,951
0
09 Dec 2016
Speed/accuracy trade-offs for modern convolutional object detectors
Speed/accuracy trade-offs for modern convolutional object detectors
Jonathan Huang
V. Rathod
Chen Sun
Menglong Zhu
Anoop Korattikara Balan
...
Ian S. Fischer
Z. Wojna
Yang Song
S. Guadarrama
Kevin Patrick Murphy
3DH
3DV
58
2,567
0
30 Nov 2016
Online Real-time Multiple Spatiotemporal Action Localisation and
  Prediction
Online Real-time Multiple Spatiotemporal Action Localisation and Prediction
Gurkirt Singh
Suman Saha
Michael Sapienza
Philip Torr
Fabio Cuzzolin
49
286
0
25 Nov 2016
YouTube-8M: A Large-Scale Video Classification Benchmark
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
69
1,264
0
27 Sep 2016
Temporal Segment Networks: Towards Good Practices for Deep Action
  Recognition
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
78
3,814
0
02 Aug 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
187
10,412
0
21 Jul 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity
  Understanding
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
64
1,232
0
06 Apr 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
998
192,638
0
10 Dec 2015
Action Recognition using Visual Attention
Action Recognition using Visual Attention
Shikhar Sharma
Ryan Kiros
Ruslan Salakhutdinov
43
666
0
12 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
343
61,900
0
04 Jun 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
243
10,034
0
10 Feb 2015
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
65
6,100
0
03 Dec 2012
1