ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.01657
  4. Cited By
Cross-modal Representation Learning for Zero-shot Action Recognition

Cross-modal Representation Learning for Zero-shot Action Recognition

3 May 2022
Chung-Ching Lin
Kevin Qinghong Lin
Linjie Li
Lijuan Wang
Zicheng Liu
    ViT
ArXiv (abs)PDFHTML

Papers citing "Cross-modal Representation Learning for Zero-shot Action Recognition"

28 / 28 papers shown
Title
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLMViTVLM
269
348
0
22 Sep 2021
Elaborative Rehearsal for Zero-shot Action Recognition
Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen
Dong Huang
VLM
65
96
0
05 Aug 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
418
4,987
0
24 Feb 2021
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
232
5,091
0
08 Oct 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
Lefei Zhang
Jianfeng Gao
Zicheng Liu
VLM
79
56
0
28 Sep 2020
All About Knowledge Graphs for Actions
All About Knowledge Graphs for Actions
P. Ghosh
Nirat Saini
L. Davis
Abhinav Shrivastava
59
31
0
28 Aug 2020
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
537
610
0
21 Jul 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT3DVPINN
421
13,048
0
26 May 2020
X3D: Expanding Architectures for Efficient Video Recognition
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
134
1,020
0
09 Apr 2020
Rethinking Zero-shot Video Classification: End-to-end Training for
  Realistic Applications
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Biagio Brattoli
Joseph Tighe
Fedor Zhdanov
Pietro Perona
Krzysztof Chalupka
VLM
176
130
0
03 Mar 2020
Big Transfer (BiT): General Visual Representation Learning
Big Transfer (BiT): General Visual Representation Learning
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
J. Puigcerver
Jessica Yung
Sylvain Gelly
N. Houlsby
MQ
286
1,211
0
24 Dec 2019
Something-Else: Compositional Action Recognition with Spatial-Temporal
  Interaction Networks
Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks
Joanna Materzynska
Tete Xiao
Roei Herzig
Huijuan Xu
Xiaolong Wang
Trevor Darrell
CoGe
53
176
0
20 Dec 2019
Locality and compositionality in zero-shot learning
Locality and compositionality in zero-shot learning
Tristan Sylvain
Linda Petrini
R. Devon Hjelm
55
56
0
20 Dec 2019
CATER: A diagnostic dataset for Compositional Actions and TEmporal
  Reasoning
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
Rohit Girdhar
Deva Ramanan
69
178
0
10 Oct 2019
Zero-Shot Action Recognition in Videos: A Survey
Zero-Shot Action Recognition in Videos: A Survey
Valter Estevam
Hélio Pedrini
David Menotti
75
58
0
13 Sep 2019
Out-of-Distribution Detection for Generalized Zero-Shot Action
  Recognition
Out-of-Distribution Detection for Generalized Zero-Shot Action Recognition
Devraj Mandal
Sanath Narayan
Sai Kumar Dwivedi
Vikram Gupta
Shuaib Ahmed
Fahad Shahbaz Khan
Ling Shao
OODD
54
141
0
18 Apr 2019
Approximating CNNs with Bag-of-local-Features models works surprisingly
  well on ImageNet
Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
Wieland Brendel
Matthias Bethge
SSLFAtt
96
561
0
20 Mar 2019
Action2Vec: A Crossmodal Embedding Approach to Action Learning
Action2Vec: A Crossmodal Embedding Approach to Action Learning
Meera Hahn
Andrew Silva
James M. Rehg
63
58
0
02 Jan 2019
Video Action Transformer Network
Video Action Transformer Network
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
ViT
131
709
0
06 Dec 2018
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
98
1,692
0
20 Nov 2018
Visual Data Synthesis via GAN for Zero-Shot Video Classification
Visual Data Synthesis via GAN for Zero-Shot Video Classification
Chenrui Zhang
Yuxin Peng
63
47
0
26 Apr 2018
Towards Universal Representation for Unseen Action Recognition
Towards Universal Representation for Unseen Action Recognition
Yi Zhu
Yang Long
Yu Guan
Shawn D. Newsam
Ling Shao
AI4TS
86
103
0
22 Mar 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
121
4,220
0
25 Jul 2017
Alternative Semantic Representations for Zero-Shot Human Action
  Recognition
Alternative Semantic Representations for Zero-Shot Human Action Recognition
Qian Wang
Ke Chen
VLM
65
73
0
28 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
235
8,037
0
22 May 2017
Temporal Segment Networks: Towards Good Practices for Deep Action
  Recognition
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
105
3,838
0
02 Aug 2016
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
680
31,538
0
16 Jan 2013
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIPVGen
157
6,162
0
03 Dec 2012
1