Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.09430
Cited By
CNN Architectures for Large-Scale Audio Classification
29 September 2016
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
R. C. Moore
Manoj Plakal
D. Platt
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CNN Architectures for Large-Scale Audio Classification"
50 / 336 papers shown
Title
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
33
3
0
10 Oct 2023
Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech Audio
Forsad Al Hossain
Tanjid Hasan Tonmoy
A. Lover
George A. Corey
Mohammad Arif Ul Alam
Tauhidur Rahman
19
1
0
19 Sep 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
33
0
0
30 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
32
6
0
23 Aug 2023
An Ambient Intelligence-based Approach For Longitudinal Monitoring of Verbal and Vocal Depression Symptoms
Alice Othmani
M. Muzammel
14
1
0
16 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
42
224
0
10 Aug 2023
Transferable Models for Bioacoustics with Human Language Supervision
David Robinson
Adelaide Robinson
Lily Akrapongpisak
22
8
0
09 Aug 2023
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yun-Qiu Lv
Yiran Zhong
Yuchao Dai
DiffM
43
28
0
31 Jul 2023
Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization
K. Ramakrishnan
15
0
0
12 Jul 2023
Adapting Language-Audio Models as Few-Shot Audio Learners
Jinhua Liang
Xubo Liu
Haohe Liu
Huy P Phan
Emmanouil Benetos
Mark D. Plumbley
Wenwu Wang
VLM
37
19
0
28 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Hongyang Li
Yu Qiao
Hao Dong
Zhongjiang He
Peng Gao
VOS
22
30
0
25 May 2023
Audio-Visual Dataset and Method for Anomaly Detection in Traffic Videos
Błażej Leporowski
Arian Bakhtiarnia
Nicole Bonnici
A. Muscat
Luca Zanella
Yiming Wang
Alexandros Iosifidis
24
1
0
24 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
37
1
0
12 May 2023
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
K. Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
18
17
0
11 May 2023
Technical Understanding from IML Hands-on Experience: A Study through a Public Event for Science Museum Visitors
Wataru Kawabe
Yuri Nakao
Akihisa Shitara
Yusuke Sugano
28
1
0
10 May 2023
XAI-based Comparison of Input Representations for Audio Event Classification
A. Frommholz
Fabian Seipel
Sebastian Lapuschkin
Wojciech Samek
Johanna Vielhaben
AAML
AI4TS
30
6
0
27 Apr 2023
Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity
Pablo Alonso-Jiménez
Xavier Favory
Hadrien Foroughmand
Grigoris Bourdalas
Xavier Serra
T. Lidy
Dmitry Bogdanov
37
6
0
24 Apr 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
28
6
0
22 Apr 2023
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
Zheng Lian
Haiyang Sun
Guoying Zhao
Kang Chen
Mingyu Xu
...
Meng Wang
Min Zhang
Guoying Zhao
Björn W. Schuller
Jianhua Tao
40
48
0
18 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
25
2
0
12 Apr 2023
Leveraging Neural Representations for Audio Manipulation
Scott H. Hawley
C. Steinmetz
38
2
0
10 Apr 2023
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
21
42
0
30 Mar 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Joey Tianyi Zhou
3DV
20
51
0
29 Mar 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
73
30
0
26 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
37
22
0
19 Mar 2023
Multi-modal Expression Recognition with Ensemble Method
Chuanhe Liu
Xinjie Zhang
Xiaolong Liu
Tenggan Zhang
Liyu Meng
Yuchen Liu
Yuanyuan Deng
Wenqiang Jiang
CVBM
20
7
0
17 Mar 2023
Blind Estimation of Audio Processing Graph
Sungho Lee
Jaehyung Park
Seungryeol Paik
Kyogu Lee
25
9
0
15 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
24
0
0
15 Mar 2023
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
24
4
0
03 Mar 2023
Unsupervised classification to improve the quality of a bird song recording dataset
Félix Michaud
J. Sueur
Maxime LE Cesne
S. Haupert
27
28
0
15 Feb 2023
Detection and classification of vocal productions in large scale audio recordings
Guillem Bonafos
Pierre Pudlo
Jean-Marc Freyermuth
T. Legou
J. Fagot
Samuel Tronccon
Arnaud Rey
AI4TS
19
1
0
14 Feb 2023
SingSong: Generating musical accompaniments from singing
Chris Donahue
Antoine Caillon
Adam Roberts
Ethan Manilow
P. Esling
...
Mauro Verzetti
Ian Simon
Olivier Pietquin
Neil Zeghidour
Jesse Engel
37
52
0
30 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
26
15
0
19 Jan 2023
Training one model to detect heart and lung sound events from single point auscultations
Leander Melms
Robert R. Ilesan
Ulrich Köhler
O. Hildebrandt
R. Conradt
...
Jürgen R. Schaefer
Tobias Müller
J. Obergassel
Nadine Schlicker
M. Hirsch
26
2
0
15 Jan 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
33
9
0
14 Jan 2023
Action Dynamics Task Graphs for Learning Plannable Representations of Procedural Tasks
Weichao Mao
Ruta Desai
Michael L. Iuzzolino
Nitin Kamra
26
5
0
11 Jan 2023
Data Distillation: A Survey
Noveen Sachdeva
Julian McAuley
DD
45
73
0
11 Jan 2023
Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction
Boyi Li
Rodolfo Corona
K. Mangalam
Catherine Chen
Daniel Flaherty
Serge Belongie
Kilian Q. Weinberger
Jitendra Malik
Trevor Darrell
Dan Klein
21
1
0
20 Dec 2022
Visual Transformers for Primates Classification and Covid Detection
Steffen Illium
Robert Muller
Andreas Sedlmeier
Claudia Linnhoff-Popien
38
11
0
20 Dec 2022
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Darius Petermann
G. Wichern
Aswin Shanmugam Subramanian
Zhong-Qiu Wang
Jonathan Le Roux
27
10
0
14 Dec 2022
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
27
2
0
09 Dec 2022
NEAL: An open-source tool for audio annotation
A. Gibbons
I. Donohue
Courtney E. Gorman
Emma King
Andrew C. Parnell
13
3
0
02 Dec 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
24
9
0
21 Nov 2022
Music Instrument Classification Reprogrammed
Hsin-Hung Chen
Alexander Lerch
24
4
0
15 Nov 2022
The Birds Need Attention Too: Analysing usage of Self Attention in identifying bird calls in soundscapes
Chandra Kanth Nagesh
Abhishek Purushothama
24
2
0
14 Nov 2022
Unsupervised vocal dereverberation with diffusion-based generative models
Koichi Saito
Naoki Murata
Toshimitsu Uesaka
Chieh-Hsin Lai
Yuhta Takida
Takao Fukui
Yuki Mitsufuji
DiffM
47
23
0
08 Nov 2022
"Seeing Sound": Audio Classification with the Wigner-Wille Distribution and Convolutional Neural Networks
Antonios Marios Christonasis
S.J.L. van Eijndhoven
P. Duin
11
0
0
06 Nov 2022
I Hear Your True Colors: Image Guided Audio Generation
Roy Sheffer
Yossi Adi
VLM
18
74
0
06 Nov 2022
When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity
Khalid Alnajjar
Mika Hämäläinen
Jörg Tiedemann
Jorma T. Laaksonen
M. Kurimo
24
2
0
03 Nov 2022
Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming
Yun-Ning Hung
Chao-Han Huck Yang
Pin-Yu Chen
Alexander Lerch
27
17
0
02 Nov 2022
Previous
1
2
3
4
5
6
7
Next