ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.10803
  4. Cited By
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual
  Video Representation Learning
v1v2v3 (latest)

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

26 January 2021
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
ArXiv (abs)PDFHTML

Papers citing "ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning"

39 / 39 papers shown
Title
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang
Yukai Shi
Jiarong Ou
Ruoxin Chen
Ke Lin
...
Mingwu Zheng
Xin Tao
Fei Yang
Pengfei Wan
Di Zhang
VGen
141
34
0
10 Oct 2024
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
Qijun Gan
Song Wang
Shengtao Wu
Jianke Zhu
244
1
0
13 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Yu Guo
VGen
259
17
0
06 Jun 2024
Deep Visual Domain Adaptation
Deep Visual Domain Adaptation
G. Csurka
OOD
201
185
0
28 Dec 2020
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
86
48
0
29 Jul 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
85
76
0
11 Jun 2020
VGGSound: A Large-scale Audio-Visual Dataset
VGGSound: A Large-scale Audio-Visual Dataset
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
89
583
0
29 Apr 2020
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
387
18,866
0
13 Feb 2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
111
432
0
28 Nov 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
213
12,124
0
13 Nov 2019
A Short Note on the Kinetics-700 Human Action Dataset
A Short Note on the Kinetics-700 Human Action Dataset
João Carreira
Eric Noland
Chloe Hillier
Andrew Zisserman
82
457
0
15 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
122
1,207
0
07 Jun 2019
On the Convergence of Adam and Beyond
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
106
2,506
0
19 Apr 2019
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
...
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
59
145
0
05 Jan 2019
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
118
131
0
11 Dec 2018
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
169
3,286
0
10 Dec 2018
Learning deep representations by mutual information estimation and
  maximization
Learning deep representations by mutual information estimation and maximization
R. Devon Hjelm
A. Fedorov
Samuel Lavoie-Marchildon
Karan Grewal
Phil Bachman
Adam Trischler
Yoshua Bengio
SSLDRL
348
2,672
0
20 Aug 2018
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri
Joseph Roth
D. Ellis
Andrew C. Gallagher
Liat Kaver
...
C. Pantofaru
Nathan Reale
Loretta Guarino Reid
K. Wilson
Zhonghua Xi
37
46
0
02 Aug 2018
Representation Learning with Contrastive Predictive Coding
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRLSSL
351
10,364
0
10 Jul 2018
Cooperative Learning of Audio and Video Models from Self-Supervised
  Synchronization
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
Bruno Korbar
Du Tran
Lorenzo Torresani
99
476
0
30 Jun 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
101
439
0
23 Mar 2018
Deep Visual Domain Adaptation: A Survey
Deep Visual Domain Adaptation: A Survey
Mei Wang
Weihong Deng
OOD
82
2,017
0
10 Feb 2018
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
96
551
0
09 Jan 2018
Interpreting Deep Visual Representations via Network Dissection
Interpreting Deep Visual Representations via Network Dissection
Bolei Zhou
David Bau
A. Oliva
Antonio Torralba
FAttMILM
60
324
0
15 Nov 2017
Look, Listen and Learn
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
127
906
0
23 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
240
8,041
0
22 May 2017
The Kinetics Human Action Video Dataset
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
256
3,816
0
19 May 2017
FastText.zip: Compressing text classification models
FastText.zip: Compressing text classification models
Armand Joulin
Edouard Grave
Piotr Bojanowski
Matthijs Douze
Hervé Jégou
Tomas Mikolov
MQ
91
1,216
0
12 Dec 2016
Spatiotemporal Residual Networks for Video Action Recognition
Spatiotemporal Residual Networks for Video Action Recognition
Christoph Feichtenhofer
A. Pinz
Richard P. Wildes
113
719
0
07 Nov 2016
SoundNet: Learning Sound Representations from Unlabeled Video
SoundNet: Learning Sound Representations from Unlabeled Video
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
128
1,044
0
27 Oct 2016
CNN Architectures for Large-Scale Audio Classification
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
130
2,509
0
29 Sep 2016
YouTube-8M: A Large-Scale Video Classification Benchmark
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
155
1,272
0
27 Sep 2016
Bag of Tricks for Efficient Text Classification
Bag of Tricks for Efficient Text Classification
Armand Joulin
Edouard Grave
Piotr Bojanowski
Tomas Mikolov
VLM
183
4,632
0
06 Jul 2016
Visually Indicated Sounds
Visually Indicated Sounds
Andrew Owens
Phillip Isola
Josh H. McDermott
Antonio Torralba
Edward H. Adelson
William T. Freeman
99
386
0
28 Dec 2015
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,510
0
10 Dec 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,364
0
22 Dec 2014
How transferable are features in deep neural networks?
How transferable are features in deep neural networks?
J. Yosinski
Jeff Clune
Yoshua Bengio
Hod Lipson
OOD
236
8,353
0
06 Nov 2014
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIPVGen
160
6,164
0
03 Dec 2012
Improving neural networks by preventing co-adaptation of feature
  detectors
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
463
7,667
0
03 Jul 2012
1