ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17149
  4. Cited By
Are Visual-Language Models Effective in Action Recognition? A
  Comparative Study

Are Visual-Language Models Effective in Action Recognition? A Comparative Study

22 October 2024
Mahmoud Ali
Di Yang
François Brémond
    VLM
ArXivPDFHTML

Papers citing "Are Visual-Language Models Effective in Action Recognition? A Comparative Study"

17 / 17 papers shown
Title
Latte: Latent Diffusion Transformer for Video Generation
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
238
270
0
05 Jan 2024
Think Before You Act: Unified Policy for Interleaving Language Reasoning
  with Actions
Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions
Lina Mezghani
Piotr Bojanowski
Alahari Karteek
Sainbayar Sukhbaatar
LM&Ro
OffRL
LRM
69
8
0
18 Apr 2023
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
123
327
0
06 Dec 2022
ViA: View-invariant Skeleton Action Representation Learning via Motion
  Retargeting
ViA: View-invariant Skeleton Action Representation Learning via Motion Retargeting
Di Yang
Yaohui Wang
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
49
9
0
31 Aug 2022
All in One: Exploring Unified Video-Language Pre-training
All in One: Exploring Unified Video-Language Pre-training
Alex Jinpeng Wang
Yixiao Ge
Rui Yan
Yuying Ge
Xudong Lin
Guanyu Cai
Jianping Wu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
82
202
0
14 Mar 2022
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai
Srijan Das
Kumara Kahatapitiya
Michael S. Ryoo
Francois Bremond
ViT
81
73
0
07 Dec 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video
  Understanding
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
51
132
0
20 May 2021
VPN++: Rethinking Video-Pose embeddings for understanding Activities of
  Daily Living
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living
Srijan Das
Rui Dai
Di Yang
Francois Bremond
ViT
80
70
0
17 May 2021
UAV-Human: A Large Benchmark for Human Behavior Understanding with
  Unmanned Aerial Vehicles
UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Tianjiao Li
Jun Liu
Wei Emma Zhang
Yun Ni
Wenqian Wang
Zhiheng Li
AI4TS
65
191
0
02 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
84
128
0
30 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
222
2,150
0
29 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
927
29,436
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
654
41,103
0
22 Oct 2020
A Short Note on the Kinetics-700 Human Action Dataset
A Short Note on the Kinetics-700 Human Action Dataset
João Carreira
Eric Noland
Chloe Hillier
Andrew Zisserman
76
453
0
15 Jul 2019
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity
  Understanding
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Jun Liu
Amir Shahroudy
Mauricio Perez
G. Wang
Ling-yu Duan
Alex C. Kot
80
1,289
0
12 May 2019
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
232
8,019
0
22 May 2017
Hollywood in Homes: Crowdsourcing Data Collection for Activity
  Understanding
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
104
1,245
0
06 Apr 2016
1