Cross-modal Representation Learning for Zero-shot Action Recognition

3 May 2022

Zicheng Liu

Papers citing "Cross-modal Representation Learning for Zero-shot Action Recognition"

28 / 28 papers shown

Title
Pix2seq: A Language Modeling Framework for Object Detection Ting-Li Chen Saurabh Saxena Lala Li David J. Fleet Geoffrey E. Hinton MLLM ViT VLM 269 348 0 22 Sep 2021
Elaborative Rehearsal for Zero-shot Action Recognition Shizhe Chen Dong Huang VLM 65 96 0 05 Aug 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 418 4,987 0 24 Feb 2021
Deformable DETR: Deformable Transformers for End-to-End Object Detection Xizhou Zhu Weijie Su Lewei Lu Bin Li Xiaogang Wang Jifeng Dai ViT 232 5,091 0 08 Oct 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning Xiaowei Hu Xi Yin Kevin Qinghong Lin Lijuan Wang Lefei Zhang Jianfeng Gao Zicheng Liu VLM 79 56 0 28 Sep 2020
All About Knowledge Graphs for Actions P. Ghosh Nirat Saini L. Davis Abhinav Shrivastava 59 31 0 28 Aug 2020
Multi-modal Transformer for Video Retrieval Valentin Gabeur Chen Sun Alahari Karteek Cordelia Schmid ViT 537 610 0 21 Jul 2020
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 421 13,048 0 26 May 2020
X3D: Expanding Architectures for Efficient Video Recognition Christoph Feichtenhofer 134 1,020 0 09 Apr 2020
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications Biagio Brattoli Joseph Tighe Fedor Zhdanov Pietro Perona Krzysztof Chalupka VLM 176 130 0 03 Mar 2020
Big Transfer (BiT): General Visual Representation Learning Alexander Kolesnikov Lucas Beyer Xiaohua Zhai J. Puigcerver Jessica Yung Sylvain Gelly N. Houlsby MQ 286 1,211 0 24 Dec 2019
Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks Joanna Materzynska Tete Xiao Roei Herzig Huijuan Xu Xiaolong Wang Trevor Darrell CoGe 53 176 0 20 Dec 2019
Locality and compositionality in zero-shot learning Tristan Sylvain Linda Petrini R. Devon Hjelm 55 56 0 20 Dec 2019
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning Rohit Girdhar Deva Ramanan 69 178 0 10 Oct 2019
Zero-Shot Action Recognition in Videos: A Survey Valter Estevam Hélio Pedrini David Menotti 75 58 0 13 Sep 2019
Out-of-Distribution Detection for Generalized Zero-Shot Action Recognition Devraj Mandal Sanath Narayan Sai Kumar Dwivedi Vikram Gupta Shuaib Ahmed Fahad Shahbaz Khan Ling Shao OODD 54 141 0 18 Apr 2019
Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet Wieland Brendel Matthias Bethge SSL FAtt 96 561 0 20 Mar 2019
Action2Vec: A Crossmodal Embedding Approach to Action Learning Meera Hahn Andrew Silva James M. Rehg 63 58 0 02 Jan 2019
Video Action Transformer Network Rohit Girdhar João Carreira Carl Doersch Andrew Zisserman ViT 131 709 0 06 Dec 2018
TSM: Temporal Shift Module for Efficient Video Understanding Ji Lin Chuang Gan Song Han 98 1,692 0 20 Nov 2018
Visual Data Synthesis via GAN for Zero-Shot Video Classification Chenrui Zhang Yuxin Peng 63 47 0 26 Apr 2018
Towards Universal Representation for Unseen Action Recognition Yi Zhu Yang Long Yu Guan Shawn D. Newsam Ling Shao AI4TS 86 103 0 22 Mar 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould Lei Zhang AIMat 121 4,220 0 25 Jul 2017
Alternative Semantic Representations for Zero-Shot Human Action Recognition Qian Wang Ke Chen VLM 65 73 0 28 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 235 8,037 0 22 May 2017
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang Luc Van Gool ViT 105 3,838 0 02 Aug 2016
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov Kai Chen G. Corrado J. Dean 3DV 680 31,538 0 16 Jan 2013
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild K. Soomro Amir Zamir M. Shah CLIP VGen 157 6,162 0 03 Dec 2012