ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.00561
  4. Cited By
Large-scale weakly-supervised pre-training for video action recognition

Large-scale weakly-supervised pre-training for video action recognition

2 May 2019
Deepti Ghadiyaram
Matt Feiszli
Du Tran
Xueting Yan
Heng Wang
D. Mahajan
ArXivPDFHTML

Papers citing "Large-scale weakly-supervised pre-training for video action recognition"

50 / 82 papers shown
Title
CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation
CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation
Chenying Liu
C. Albrecht
Yi Wang
Xiao Xiang Zhu
70
2
0
02 May 2024
A Strong Baseline for Temporal Video-Text Alignment
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
50
5
0
21 Dec 2023
Learning Human Action Recognition Representations Without Real Humans
Learning Human Action Recognition Representations Without Real Humans
Howard Zhong
Samarth Mishra
Donghyun Kim
SouYoung Jin
Yikang Shen
Hildegard Kuehne
Leonid Karlinsky
Venkatesh Saligrama
Aude Oliva
Rogerio Feris
29
3
0
10 Nov 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
41
15
0
28 Sep 2023
Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Gensheng Pei
Yazhou Yao
Fumin Shen
Daniel Huang
Xing-Rui Huang
Hengtao Shen
VOS
40
12
0
08 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
45
39
0
31 Mar 2023
SimCon Loss with Multiple Views for Text Supervised Semantic
  Segmentation
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Yash J. Patel
Yusheng Xie
Yi Zhu
Srikar Appalaraju
R. Manmatha
40
4
0
07 Feb 2023
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
29
4
0
05 Jan 2023
Multi-Instance Partial-Label Learning: Towards Exploiting Dual Inexact
  Supervision
Multi-Instance Partial-Label Learning: Towards Exploiting Dual Inexact Supervision
Wei Tang
Weijia Zhang
Min-Ling Zhang
19
12
0
18 Dec 2022
Multimodal Vision Transformers with Forced Attention for Behavior
  Analysis
Multimodal Vision Transformers with Forced Attention for Behavior Analysis
Tanay Agrawal
Michal Balazia
Philippe Muller
Franccois Brémond
ViT
30
9
0
07 Dec 2022
An Action Is Worth Multiple Words: Handling Ambiguity in Action
  Recognition
An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition
Kiyoon Kim
Davide Moltisanti
Oisin Mac Aodha
Laura Sevilla-Lara
21
0
0
10 Oct 2022
Self-supervised Video Representation Learning with Motion-Aware Masked
  Autoencoders
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
Huanjin Yao
Yi Jiang
Xiatian Zhu
Zehuan Yuan
43
19
0
09 Oct 2022
A Feature-space Multimodal Data Augmentation Technique for Text-video
  Retrieval
A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval
Alex Falcon
G. Serra
Oswald Lanz
VGen
46
25
0
03 Aug 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation
Beyond Transfer Learning: Co-finetuning for Action Localisation
Anurag Arnab
Xuehan Xiong
A. Gritsenko
Rob Romijnders
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
38
8
0
08 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video
  Recognition
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
105
93
0
04 Jul 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and
  Applications
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
42
53
0
02 Jun 2022
Weakly-supervised segmentation of referring expressions
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
22
21
0
10 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
35
3
0
29 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
18
44
0
26 Apr 2022
3D Convolutional Networks for Action Recognition: Application to Sport
  Gesture Recognition
3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition
Pierre-Etienne Martin
J. Benois-Pineau
Renaud Péteri
A. Zemmari
J. Morlier
29
5
0
13 Apr 2022
Detection of Distracted Driver using Convolution Neural Network
Detection of Distracted Driver using Convolution Neural Network
Narayana Darapaneni
Jai Arora
MoniShankar Hazra
Naman Vig
Simrandeep Singh Gandhi
Saurabh Gupta
A. Paduri
13
8
0
07 Apr 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
170
1,134
0
23 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
  More Step Towards Generalization
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
30
16
0
14 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary
  Detection
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
61
14
0
01 Mar 2022
Learning To Recognize Procedural Activities with Distant Supervision
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
35
83
0
26 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
26
103
0
16 Jan 2022
Argus++: Robust Real-time Activity Detection for Unconstrained Video
  Streams with Overlapping Cube Proposals
Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals
Lijun Yu
Yijun Qian
Wenhe Liu
Alexander G. Hauptmann
27
13
0
14 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
29
108
0
13 Jan 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
26
212
0
12 Jan 2022
Sign Language Video Retrieval with Free-Form Textual Queries
Sign Language Video Retrieval with Free-Form Textual Queries
A. Duarte
Samuel Albanie
Xavier Giró-i-Nieto
Gül Varol
SLR
53
29
0
07 Jan 2022
Cross Modal Retrieval with Querybank Normalisation
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
29
84
0
23 Dec 2021
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for
  Cross-Domain Robustness in Action Recognition
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
Sofia Broomé
Ernest Pokropek
Boyu Li
Hedvig Kjellström
21
7
0
22 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from
  a Single Image
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
50
7
0
01 Dec 2021
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy
  Labels
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels
Mohit Sharma
Rajkumar Patra
Harshali Desai
Shruti Vyas
Yogesh S Rawat
R. Shah
VGen
NoLa
24
3
0
13 Oct 2021
A Baseline Framework for Part-level Action Parsing and Action
  Recognition
A Baseline Framework for Part-level Action Parsing and Action Recognition
Xiaodong Chen
Xinchen Liu
Kun Liu
Wu Liu
Tao Mei
29
3
0
07 Oct 2021
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani
Yingli Tian
54
60
0
30 Sep 2021
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic
  Interactions
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
D. Curto
Albert Clapés
Javier Selva
Sorina Smeureanu
Julio C. S. Jacques Junior
...
G. Guilera
D. Leiva
T. Moeslund
Sergio Escalera
Cristina Palmero
51
29
0
20 Sep 2021
Feature Combination Meets Attention: Baidu Soccer Embeddings and
  Transformer based Temporal Detection
Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection
Xin Zhou
Le Kang
Zhiyu Cheng
Bo He
Jingyu Xin
51
34
0
28 Jun 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language Queries
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
21
77
0
05 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
39
257
0
29 Apr 2021
Adaptive Configuration of In Situ Lossy Compression for Cosmology
  Simulations via Fine-Grained Rate-Quality Modeling
Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality Modeling
Sian Jin
Jesus Pulido
Pascal Grosset
Jiannan Tian
Dingwen Tao
J. Ahrens
33
22
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
33
127
0
30 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
24
128
0
19 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
31
33
0
18 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual
  Transfer of Vision-Language Models
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
26
56
0
16 Mar 2021
On the Post-hoc Explainability of Deep Echo State Networks for Time
  Series Forecasting, Image and Video Classification
On the Post-hoc Explainability of Deep Echo State Networks for Time Series Forecasting, Image and Video Classification
Alejandro Barredo Arrieta
S. Gil-Lopez
I. Laña
Miren Nekane Bilbao
Javier Del Ser
AI4TS
41
13
0
17 Feb 2021
Learning to Recognize Actions on Objects in Egocentric Video with
  Attention Dictionaries
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
EgoV
29
15
0
16 Feb 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Kunpeng Li
Zizhao Zhang
Guanhang Wu
Xuehan Xiong
Chen-Yu Lee
Zhichao Lu
Y. Fu
Tomas Pfister
34
5
0
11 Jan 2021
Self-Supervised Pretraining of 3D Features on any Point-Cloud
Self-Supervised Pretraining of 3D Features on any Point-Cloud
Zaiwei Zhang
Rohit Girdhar
Armand Joulin
Ishan Misra
3DPC
137
267
0
07 Jan 2021
Refining activation downsampling with SoftPool
Refining activation downsampling with SoftPool
Alexandros Stergiou
R. Poppe
Grigorios Kalliatakis
36
159
0
02 Jan 2021
12
Next