Title
The CORSMAL benchmark for the prediction of the properties of containers Alessio Xompero Santiago Donaher Vladimir E. Iashin Francesca Palermo Gokhan Solak ... G. Neeharika Chinnakotla Krishna Teja Reddy Dinesh Jain B. Rehman Andrea Cavallaro 30 10 0 27 Jul 2021
PERSA+: A Deep Learning Front-End for Context-Agnostic Audio Classification Lazaros Vrysis Iordanis Thoidis Charalampos A. Dimoulas G. Papanikolaou VLM 33 0 0 20 Jul 2021
Project Achoo: A Practical Model and Application for COVID-19 Detection from Recordings of Breath, Voice, and Cough Alexander Ponomarchuk I. Burenko Elian Malkin Ivan Nazarov V. Kokh Manvel Avetisian L. Zhukov 39 40 0 12 Jul 2021
Neural Waveshaping Synthesis B. Hayes C. Saitis Gyorgy Fazekas 36 28 0 11 Jul 2021
Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases Subhashini Venugopalan Joel Shor Manoj Plakal Jimmy Tobin Katrin Tomanek Jordan R. Green Michael P. Brenner 27 12 0 08 Jul 2021
Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model Zhiqi Huang Fenglin Liu Xian Wu Shen Ge Helin Wang Wei Fan Yuexian Zou AuLLM 29 2 0 04 Jul 2021
Continuous Emotion Recognition with Audio-visual Leader-follower Attentive Fusion Su Zhang Yi Ding Ziquan Wei Cuntai Guan 40 25 0 02 Jul 2021
Attention Bottlenecks for Multimodal Fusion Arsha Nagrani Shan Yang Anurag Arnab A. Jansen Cordelia Schmid Chen Sun 25 543 0 30 Jun 2021
Towards sound based testing of COVID-19 -- Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge N. Sharma Ananya Muguli Prashant Krishnan Rohit Kumar Srikanth Raj Chetupalli Sriram Ganapathy 33 13 0 21 Jun 2021
Zero-Shot Federated Learning with New Classes for Audio Classification Gautham Krishna Gudur S. K. Perepu FedML 13 10 0 18 Jun 2021
Voice2Series: Reprogramming Acoustic Models for Time Series Classification Chao-Han Huck Yang Yun-Yun Tsai Pin-Yu Chen AI4TS 29 122 0 17 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition Mathilde Brousmiche Jean Rouat Stéphane Dupont 27 11 0 12 Jun 2021
Impact of data-splits on generalization: Identifying COVID-19 from cough and context Makkunda Sharma Nikhil Shenoy Jigar Doshi Piyush Bagad Aman Dalmia Parag Bhamare A. Mahale S. Rane Neeraj Agrawal R. Panicker OOD 50 4 0 05 Jun 2021
Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks Khaled Koutini Hamid Eghbalzadeh Gerhard Widmer 30 46 0 26 May 2021
Social Behaviour Understanding using Deep Neural Networks: Development of Social Intelligence Systems Ethan Lim Ding Feng Zhi-Wei Neo Aaron William De Silva Kellie Sim Hong-Ray Tan T. Nguyen K. Koh Wenru Wang Hoang D. Nguyen 17 2 0 20 May 2021
Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead Arian Bakhtiarnia Qi Zhang Alexandros Iosifidis 27 35 0 19 May 2021
Audio Retrieval with Natural Language Queries Andreea-Maria Oncescu A. Sophia Koepke João F. Henriques Zeynep Akata Samuel Albanie 21 77 0 05 May 2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection Shixing Chen Xiaohan Nie David D. Fan Dongqing Zhang Vimal Bhat Raffay Hamid SSL 27 62 0 28 Apr 2021
The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System Lorin Sweeney Graham Healy Alan F. Smeaton 19 11 0 23 Apr 2021
Room adaptive conditioning method for sound event classification in reverberant environments Jaejun Lee Donmoon Lee Hyeong-Seok Choi Kyogu Lee 23 2 0 21 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval Xiaohan Wang Linchao Zhu Yi Yang 170 170 0 20 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks Hung Le Nancy F. Chen Guosheng Lin MLLM 26 19 0 16 Apr 2021
Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition E. Koh Shlomo Dubnov 29 38 0 13 Apr 2021
Uncertainty-Aware COVID-19 Detection from Imbalanced Sound Data Tong Xia Jing Han Lorena Qendro T. Dang Cecilia Mascolo 29 25 0 05 Apr 2021
SubSpectral Normalization for Neural Audio Data Processing Simyung Chang Hyoungwoo Park Janghoon Cho Hyunsin Park Sungrack Yun Kyuwoong Hwang 23 30 0 25 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval Maksim Dzabraev M. Kalashnikov Stepan Alekseevich Komkov Aleksandr Petiushko 24 128 0 19 Mar 2021
Slow-Fast Auditory Streams For Audio Recognition Evangelos Kazakos Arsha Nagrani Andrew Zisserman Dima Damen 16 66 0 05 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge Francisco Rivera Valverde Juana Valeria Hurtado Abhinav Valada 26 72 0 01 Mar 2021
Multi-modal Ensemble Models for Predicting Video Memorability Tony Zhao Irving Fang Jeffrey Kim Gerald Friedland 19 5 0 01 Feb 2021
A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers Mahsa Shafaei C. Smailis I. Kakadiaris Thamar Solorio 141 1 0 26 Jan 2021
LEAF: A Learnable Frontend for Audio Classification Neil Zeghidour O. Teboul Félix de Chaumont Quitry Marco Tagliasacchi VLM AAML 85 144 0 21 Jan 2021
The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements Lukas Stappen Alice Baird Lea Schumann Björn Schuller 42 59 0 15 Jan 2021
Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices G. Cerutti Renzo Andri Lukas Cavigelli Michele Magno Elisabetta Farella Luca Benini MQ 21 37 0 12 Jan 2021
Environment Transfer for Distributed Systems Chunheng Jiang Jae-wook Ahn N. Desai 28 1 0 06 Jan 2021
Context-Aware Personality Inference in Dyadic Scenarios: Introducing the UDIVA Dataset Cristina Palmero Javier Selva Sorina Smeureanu Julio C. S. Jacques Junior Albert Clapés ... Zejian Zhang D. Gallardo-Pujol G. Guilera D. Leiva Sergio Escalera 28 53 0 28 Dec 2020
Skeleton-DML: Deep Metric Learning for Skeleton-Based One-Shot Action Recognition Raphael Memmesheimer Simon Häring Nick Theisen Dietrich Paulus 35 36 0 26 Dec 2020
Analysis of Feature Representations for Anomalous Sound Detection Robert Muller Steffen Illium Fabian Ritz Kyrill Schmid 16 18 0 11 Dec 2020
Multi-Modal Detection of Alzheimer's Disease from Speech and Text Amish Mittal Sourav Sahoo Arnhav Datar Juned Kadiwala H. Shalu Jimson Mathew 12 20 0 30 Nov 2020
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos A. Deliège A. Cioppa Silvio Giancola M. J. Seikavandi J. Dueholm Kamal Nasrollahi Guohao Li T. Moeslund Marc Van Droogenbroeck 18 152 0 26 Nov 2020
Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough Gunvant R. Chaudhari Xinyi Jiang Ahmed E. Fakhry Asriel Han Jaclyn Xiao Sabrina Shen Amil Khanzada 21 92 0 26 Nov 2020
Learning to dance: A graph convolutional adversarial network to generate realistic dance motions from audio João P. Ferreira Thiago M. Coutinho Thiago L. Gomes J. F. Neto Rafael Azevedo Renato Martins Erickson R. Nascimento GAN 36 68 0 25 Nov 2020
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog Wubo Li Dongwei Jiang Wei Zou Xiangang Li 23 6 0 21 Oct 2020
Real-time Speech Frequency Bandwidth Extension Yunpeng Li Marco Tagliasacchi Oleg Rybakov Victor Ungureanu Dominik Roblek 17 47 0 21 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues Hung Le Doyen Sahoo Nancy F. Chen Guosheng Lin 44 30 0 20 Oct 2020
CLAR: Contrastive Learning of Auditory Representations Haider Al-Tahan Y. Mohsenzadeh SSL 118 56 0 19 Oct 2020
Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning Noriyuki Tonami Keisuke Imoto Ryosuke Yamanishi Y. Yamashita 23 13 0 16 Oct 2020
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval G. Awad A. Butt Keith Curtis Yooyoung Lee Jonathan G. Fiscus ... Lukas L. Diduch Alan F. Smeaton Yyette Graham Wessel Kraaij Georges Quénot 20 70 0 21 Sep 2020
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds Piyush Bagad Aman Dalmia Jigar Doshi Arsha Nagrani Parag Bhamare A. Mahale S. Rane N. Agarwal R. Panicker 34 112 0 17 Sep 2020
Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition Junghyun Koo Jie Hwan Lee Jaewoo Pyo Yujin Jo Kyogu Lee 27 58 0 09 Sep 2020
CRNNs for Urban Sound Tagging with spatiotemporal context Augustin Arnault Nicolas Riche 25 7 0 24 Aug 2020