Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1809.05848
Cited By
v1
v2
v3
v4 (latest)
Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification
16 September 2018
Jinlai Liu
Zehuan Yuan
Changhu Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification"
18 / 18 papers shown
Title
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
238
3,032
0
30 Nov 2017
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Zhaofan Qiu
Ting Yao
Tao Mei
102
1,663
0
28 Nov 2017
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
82
667
0
04 Aug 2017
Learnable pooling with Context Gating for video classification
Antoine Miech
Ivan Laptev
Josef Sivic
79
327
0
21 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
240
8,041
0
22 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
171
583
0
18 May 2017
ActionVLAD: Learning spatio-temporal aggregation for action classification
Rohit Girdhar
Deva Ramanan
Abhinav Gupta
Josef Sivic
Bryan C. Russell
AI4TS
82
451
0
10 Apr 2017
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
130
2,510
0
29 Sep 2016
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
155
1,272
0
27 Sep 2016
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
120
3,841
0
02 Aug 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.3K
194,510
0
10 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
886
27,427
0
02 Dec 2015
NetVLAD: CNN architecture for weakly supervised place recognition
Relja Arandjelović
Petr Gronát
Akihiko Torii
Tomas Pajdla
Josef Sivic
3DV
SSL
130
2,652
0
23 Nov 2015
Compact Bilinear Pooling
Yang Gao
Oscar Beijbom
Ning Zhang
Trevor Darrell
83
791
0
19 Nov 2015
Beyond Short Snippets: Deep Networks for Video Classification
Joe Yue-Hei Ng
Matthew J. Hausknecht
Sudheendra Vijayanarasimhan
Oriol Vinyals
R. Monga
G. Toderici
151
2,337
0
31 Mar 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,364
0
22 Dec 2014
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Kyunghyun Cho
B. V. Merrienboer
Dzmitry Bahdanau
Yoshua Bengio
AI4CE
AIMat
265
6,791
0
03 Sep 2014
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
261
7,545
0
09 Jun 2014
1