ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.01673
  4. Cited By
Relational Self-Attention: What's Missing in Attention for Video
  Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding

2 November 2021
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
    ViT
ArXiv (abs)PDFHTML

Papers citing "Relational Self-Attention: What's Missing in Attention for Video Understanding"

50 / 54 papers shown
Title
CT-Net: Channel Tensorization Network for Video Classification
CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li
Xianhang Li
Yali Wang
Jun Wang
Yu Qiao
ViT
62
55
0
03 Jun 2021
Segmenter: Transformer for Semantic Segmentation
Segmenter: Transformer for Semantic Segmentation
Robin Strudel
Ricardo Garcia Pinel
Ivan Laptev
Cordelia Schmid
ViT
203
1,467
0
12 May 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
222
2,150
0
29 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
92
400
0
23 Mar 2021
Involution: Inverting the Inherence of Convolution for Visual
  Recognition
Involution: Inverting the Inherence of Convolution for Visual Recognition
Duo Li
Jie Hu
Changhu Wang
Xiangtai Li
Qi She
Lei Zhu
Tong Zhang
Qifeng Chen
BDL
72
304
0
10 Mar 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
332
180
0
17 Feb 2021
Learning Self-Similarity in Space and Time as Generalized Motion for
  Video Action Recognition
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
Heeseung Kwon
Manjin Kim
Suha Kwak
Minsu Cho
TTA
51
41
0
14 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
387
2,053
0
09 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
264
432
0
01 Feb 2021
Tokens-to-Token ViT: Training Vision Transformers from Scratch on
  ImageNet
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
133
1,939
0
28 Jan 2021
WeightNet: Revisiting the Design Space of Weight Networks
WeightNet: Revisiting the Design Space of Weight Networks
Ningning Ma
Xinming Zhang
Jiawei Huang
Jian Sun
56
108
0
23 Jul 2020
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Heeseung Kwon
Manjin Kim
Suha Kwak
Minsu Cho
FAtt
77
128
0
20 Jul 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT3DVPINN
421
13,048
0
26 May 2020
Exploring Self-attention for Image Recognition
Exploring Self-attention for Image Recognition
Hengshuang Zhao
Jiaya Jia
V. Koltun
SSL
95
786
0
28 Apr 2020
FineGym: A Hierarchical Video Dataset for Fine-grained Action
  Understanding
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
Dian Shao
Yue Zhao
Bo Dai
Dahua Lin
65
329
0
14 Apr 2020
X3D: Expanding Architectures for Efficient Video Recognition
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
134
1,020
0
09 Apr 2020
TEA: Temporal Excitation and Aggregation for Action Recognition
TEA: Temporal Excitation and Aggregation for Action Recognition
Yan-Ran Li
Bin Ji
Xintian Shi
Jianguo Zhang
Bin Kang
Limin Wang
ViT
88
447
0
03 Apr 2020
Dynamic Convolution: Attention over Convolution Kernels
Dynamic Convolution: Attention over Convolution Kernels
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Dongdong Chen
Lu Yuan
Zicheng Liu
101
893
0
07 Dec 2019
STM: SpatioTemporal and Motion Encoding for Action Recognition
STM: SpatioTemporal and Motion Encoding for Action Recognition
Boyuan Jiang
Mengmeng Wang
Weihao Gan
Wei Wu
Junjie Yan
81
382
0
07 Aug 2019
Stand-Alone Self-Attention in Vision Models
Stand-Alone Self-Attention in Vision Models
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
VLMSLRViT
95
1,214
0
13 Jun 2019
Video Modeling with Correlation Networks
Video Modeling with Correlation Networks
Heng Wang
Du Tran
Lorenzo Torresani
Matt Feiszli
72
129
0
07 Jun 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DVMedIm
142
18,134
0
28 May 2019
Learning Video Representations from Correspondence Proposals
Learning Video Representations from Correspondence Proposals
Xingyu Liu
Joon-Young Lee
Hailin Jin
74
63
0
20 May 2019
Local Relation Networks for Image Recognition
Local Relation Networks for Image Recognition
Han Hu
Zheng Zhang
Zhenda Xie
Stephen Lin
FAtt
85
501
0
25 Apr 2019
CondConv: Conditionally Parameterized Convolutions for Efficient
  Inference
CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Brandon Yang
Gabriel Bender
Quoc V. Le
Jiquan Ngiam
MedIm3DV
72
635
0
10 Apr 2019
Video Classification with Channel-Separated Convolutional Networks
Video Classification with Channel-Separated Convolutional Networks
Du Tran
Heng Wang
Lorenzo Torresani
Matt Feiszli
3DV
72
587
0
04 Apr 2019
Group-wise Correlation Stereo Network
Group-wise Correlation Stereo Network
Xiaoyang Guo
Kai Yang
Wukui Yang
Xiaogang Wang
Hongsheng Li
3DV
123
565
0
10 Mar 2019
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
166
3,274
0
10 Dec 2018
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
98
1,691
0
20 Nov 2018
Representation Flow for Action Recognition
Representation Flow for Action Recognition
A. Piergiovanni
Michael S. Ryoo
75
147
0
02 Oct 2018
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture
  Design
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
Ningning Ma
Xiangyu Zhang
Haitao Zheng
Jian Sun
179
4,990
0
30 Jul 2018
Motion Feature Network: Fixed Motion Filter for Action Recognition
Motion Feature Network: Fixed Motion Filter for Action Recognition
Myunggi Lee
Seungeui Lee
S. Son
Gyutae Park
Nojun Kwak
75
122
0
26 Jul 2018
Videos as Space-Time Region Graphs
Videos as Space-Time Region Graphs
Xinyu Wang
Abhinav Gupta
106
756
0
05 Jun 2018
End-to-End Learning of Motion Representation for Video Understanding
End-to-End Learning of Motion Representation for Video Understanding
Lijie Fan
Wen-bing Huang
Chuang Gan
Stefano Ermon
Boqing Gong
Junzhou Huang
68
214
0
02 Apr 2018
MobileNetV2: Inverted Residuals and Linear Bottlenecks
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler
Andrew G. Howard
Menglong Zhu
A. Zhmoginov
Liang-Chieh Chen
184
19,284
0
13 Jan 2018
A Closer Look at Spatiotemporal Convolutions for Action Recognition
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
218
3,030
0
30 Nov 2017
Optical Flow Guided Feature: A Fast and Robust Motion Representation for
  Video Action Recognition
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Shuyang Sun
Zhanghui Kuang
Wanli Ouyang
Lu Sheng
Wayne Zhang
76
296
0
29 Nov 2017
Temporal Relational Reasoning in Videos
Temporal Relational Reasoning in Videos
Bolei Zhou
A. Andonian
Aude Oliva
Antonio Torralba
NAI
98
1,039
0
22 Nov 2017
Non-local Neural Networks
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
289
8,906
0
21 Nov 2017
The "something something" video database for learning and evaluating
  visual common sense
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
87
1,535
0
13 Jun 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
713
131,652
0
12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
235
8,019
0
22 May 2017
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
333
8,130
0
13 Aug 2016
Temporal Segment Networks: Towards Good Practices for Deep Action
  Recognition
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
105
3,835
0
02 Aug 2016
A Systematic Evaluation and Benchmark for Person Re-Identification:
  Features, Metrics, and Datasets
A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets
Srikrishna Karanam
Mengran Gou
Ziyan Wu
Angels Rates-Borras
Mario Sznaier
Richard J. Radke
92
58
0
31 May 2016
Convolutional Two-Stream Network Fusion for Video Action Recognition
Convolutional Two-Stream Network Fusion for Video Action Recognition
Christoph Feichtenhofer
A. Pinz
Andrew Zisserman
163
2,611
0
22 Apr 2016
Deep Networks with Stochastic Depth
Deep Networks with Stochastic Depth
Gao Huang
Yu Sun
Zhuang Liu
Daniel Sedra
Kilian Q. Weinberger
215
2,357
0
30 Mar 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DVBDL
883
27,373
0
02 Dec 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on
  ImageNet Classification
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
326
18,625
0
06 Feb 2015
12
Next