ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.15672
  4. Cited By
Computer Audition: From Task-Specific Machine Learning to Foundation
  Models

Computer Audition: From Task-Specific Machine Learning to Foundation Models

22 July 2024
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Tuomas Virtanen
Björn Schuller
ArXivPDFHTML

Papers citing "Computer Audition: From Task-Specific Machine Learning to Foundation Models"

40 / 40 papers shown
Title
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
114
2
0
10 Jan 2025
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
243
0
0
16 Dec 2024
Listenable Maps for Zero-Shot Audio Classifiers
Listenable Maps for Zero-Shot Audio Classifiers
Francesco Paissan
Luca Della Libera
Mirco Ravanelli
Cem Subakan
78
4
0
27 May 2024
EMR-Merging: Tuning-Free High-Performance Model Merging
EMR-Merging: Tuning-Free High-Performance Model Merging
Chenyu Huang
Peng Ye
Tao Chen
Tong He
Xiangyu Yue
Wanli Ouyang
MoMe
75
42
0
23 May 2024
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard
Shamane Siriwardhana
Malikeh Ehghaghi
Luke Meyers
Vladimir Karpukhin
Brian Benedict
Mark McQuade
Jacob Solawetz
MoMe
KELM
122
97
0
20 Mar 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
74
16
0
02 Feb 2024
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBM
AuLLM
74
127
0
01 Oct 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
78
39
0
24 Aug 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong
  General Audio Event Taggers
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Yuan Gong
Sameer Khurana
Leonid Karlinsky
James R. Glass
46
71
0
06 Jul 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
87
210
0
30 Mar 2023
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
Kai Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
112
525
0
12 Nov 2022
CochlScene: Acquisition of acoustic scene data using crowdsourcing
CochlScene: Acquisition of acoustic scene data using crowdsourcing
Il-Young Jeong
Jeongsoon Park
62
26
0
04 Nov 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
167
3,110
0
20 Oct 2022
Language-based Audio Retrieval Task in DCASE 2022 Challenge
Huang Xie
Samuel Lipping
Tuomas Virtanen
90
18
0
20 Sep 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
119
65
0
30 Aug 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
69
84
0
16 Jun 2022
Zero-Shot Audio Classification using Image Embeddings
Zero-Shot Audio Classification using Image Embeddings
Duygu Dogan
Huang Xie
Toni Heittola
Tuomas Virtanen
VLM
49
6
0
10 Jun 2022
Dawn of the transformer era in speech emotion recognition: closing the
  valence gap
Dawn of the transformer era in speech emotion recognition: closing the valence gap
Johannes Wagner
Andreas Triantafyllopoulos
H. Wierstorf
Maximilian Schmitt
Felix Burkhardt
F. Eyben
Björn W. Schuller
61
301
0
14 Mar 2022
HEAR: Holistic Evaluation of Audio Representations
HEAR: Holistic Evaluation of Audio Representations
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
60
106
0
06 Mar 2022
Wearable SELD dataset: Dataset for sound event localization and
  detection using wearable devices around head
Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
Kento Nagatomo
Masahiro Yasuda
Kohei Yatabe
Shoichiro Saito
Yasuhiro Oikawa
113
9
0
17 Feb 2022
Audio Retrieval with Natural Language Queries: A Benchmark Study
Audio Retrieval with Natural Language Queries: A Benchmark Study
A. Sophia Koepke
Andreea-Maria Oncescu
João F. Henriques
Zeynep Akata
Samuel Albanie
60
99
0
17 Dec 2021
Computational bioacoustics with deep learning: a review and roadmap
Computational bioacoustics with deep learning: a review and roadmap
D. Stowell
53
247
0
13 Dec 2021
The Role of Permutation Invariance in Linear Mode Connectivity of Neural
  Networks
The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
R. Entezari
Hanie Sedghi
O. Saukh
Behnam Neyshabur
MoMe
74
229
0
12 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
373
10,273
0
17 Jun 2021
Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised
  Anomalous Sound Detection for Machine Condition Monitoring under Domain
  Shifted Conditions
Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions
Yohei Kawaguchi
Keisuke Imoto
Yuma Koizumi
Noboru Harada
Daisuke Niizumi
Kota Dohi
Ryo Tanabe
Harsh Purohit
Takashi Endo
51
96
0
08 Jun 2021
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Jure Zbontar
Li Jing
Ishan Misra
Yann LeCun
Stéphane Deny
SSL
296
2,343
0
04 Mar 2021
LEAF: A Learnable Frontend for Audio Classification
LEAF: A Learnable Frontend for Audio Classification
Neil Zeghidour
O. Teboul
Félix de Chaumont Quitry
Marco Tagliasacchi
VLM
AAML
98
147
0
21 Jan 2021
Exponential Moving Average Normalization for Self-supervised and
  Semi-supervised Learning
Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning
Zhaowei Cai
Avinash Ravichandran
Subhransu Maji
Charless C. Fowlkes
Zhuowen Tu
Stefano Soatto
79
120
0
21 Jan 2021
Learning Representations from Audio-Visual Spatial Alignment
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
66
122
0
03 Nov 2020
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
67
453
0
01 Oct 2020
Optimizing Mode Connectivity via Neuron Alignment
Optimizing Mode Connectivity via Neuron Alignment
N. Joseph Tatro
Pin-Yu Chen
Payel Das
Igor Melnyk
P. Sattigeri
Rongjie Lai
MoMe
263
82
0
05 Sep 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
245
5,774
0
20 Jun 2020
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a
  Teacher-Student Framework With Loss Masking
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking
Eduardo Fonseca
Shawn Hershey
Manoj Plakal
D. Ellis
A. Jansen
R. C. Moore
Xavier Serra
NoLa
71
23
0
02 May 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
180
1,075
0
21 Dec 2019
Clotho: An Audio Captioning Dataset
Clotho: An Audio Captioning Dataset
Konstantinos Drossos
Samuel Lipping
Tuomas Virtanen
87
388
0
21 Oct 2019
The LOCATA Challenge: Acoustic Source Localization and Tracking
The LOCATA Challenge: Acoustic Source Localization and Tracking
C. Evers
Heinrich W. Löllmann
H. Mellmann
Alexander Schmidt
Hendrik Barfuss
Patrick A. Naylor
Walter Kellermann
47
130
0
03 Sep 2019
Learning Sound Event Classifiers from Web Audio with Noisy Labels
Learning Sound Event Classifiers from Web Audio with Noisy Labels
Eduardo Fonseca
Manoj Plakal
D. Ellis
F. Font
Xavier Favory
Xavier Serra
NoLa
60
111
0
04 Jan 2019
A multi-device dataset for urban acoustic scene classification
A multi-device dataset for urban acoustic scene classification
A. Mesaros
Toni Heittola
Tuomas Virtanen
35
380
0
25 Jul 2018
Acoustic Scene Classification
Acoustic Scene Classification
D. Barchiesi
D. Giannoulis
D. Stowell
Mark D. Plumbley
129
405
0
13 Nov 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.5K
100,330
0
04 Sep 2014
1