Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.01778
Cited By
v1
v2
v3 (latest)
AST: Audio Spectrogram Transformer
5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AST: Audio Spectrogram Transformer"
50 / 486 papers shown
Title
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
84
0
0
03 Apr 2025
Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance
Taehan Lee
Hyukjun Lee
ViT
VLM
88
0
0
02 Apr 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
257
0
0
30 Mar 2025
Comparative Analysis of Image, Video, and Audio Classifiers for Automated News Video Segmentation
Jonathan Attard
Dylan Seychell
108
0
0
27 Mar 2025
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
Suho Yoo
Hyunjong Ok
Jaeho Lee
AuLLM
RALM
105
0
0
21 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
108
0
0
20 Mar 2025
A Bird Song Detector for improving bird identification through Deep Learning: a case study from Doñana
Alba Márquez-Rodríguez
Miguel Ángel Mohedano-Munoz
Manuel J. Marín-Jiménez
Eduardo Santamaría-García
Giulia Bastianelli
Pedro Jordano
Irene Mendoza
83
0
0
19 Mar 2025
Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition
Atharva Agashe
Davelle Carreiro
A. V. Dine
Joshua Peeples
69
0
0
17 Mar 2025
R
^R
R
FLAV: Rolling Flow matching for infinite Audio Video generation
Alex Ergasti
Giuseppe Tarollo
Filippo Botti
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
VGen
84
0
0
13 Mar 2025
Targeted Data Poisoning for Black-Box Audio Datasets Ownership Verification
Wassim Bouaziz
El-Mahdi El-Mhamdi
Nicolas Usunier
86
0
0
13 Mar 2025
Learning Gentle Grasping Using Vision, Sound, and Touch
Ken Nakahara
Roberto Calandra
106
0
0
11 Mar 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Sreyan Ghosh
Zhifeng Kong
Sonal Kumar
S. Sakshi
Jaehyeon Kim
Ming-Yu Liu
Rafael Valle
Dinesh Manocha
Bryan Catanzaro
MLLM
AuLLM
LRM
136
21
0
06 Mar 2025
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim
Inyoung Jung
Dayoon Suh
Youjia Zhang
Sangmin Lee
Sungeun Hong
132
0
0
06 Mar 2025
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
82
1
0
28 Feb 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
105
0
0
26 Feb 2025
Hedge Fund Portfolio Construction Using PolyModel Theory and iTransformer
Siqiao Zhao
Zhikang Dong
Zeyu Cao
Raphael Douady
137
6
0
17 Feb 2025
Akan Cinematic Emotions (ACE): A Multimodal Multi-party Dataset for Emotion Recognition in Movie Dialogues
David Sasu
Zehui Wu
Ziwei Gong
Run Chen
Pengyuan Shi
Lin Ai
Julia Hirschberg
Natalie Schluter
173
3
0
16 Feb 2025
Harnessing Vision Models for Time Series Analysis: A Survey
Jingchao Ni
Ziming Zhao
ChengAo Shen
Hanghang Tong
Dongjin Song
Wei Cheng
Dongsheng Luo
Haifeng Chen
AI4TS
185
6
0
13 Feb 2025
Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach
Timo Fudala
Vasileios Tsouvalas
N. Meratnia
MoE
118
0
0
10 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
151
0
0
05 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
146
2
0
28 Jan 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
172
4
0
28 Jan 2025
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
Erfan Yazdandoost Hamedani
Mahyar Fazlyab
93
3
0
27 Jan 2025
Hybrid Losses for Hierarchical Embedding Learning
Haokun Tian
Stefan Lattner
Brian McFee
Charalampos Saitis
89
0
0
22 Jan 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
125
0
0
20 Jan 2025
AudioBERT: Audio Knowledge Augmented Language Model
Hyunjong Ok
Suho Yoo
Jaeho Lee
AuLLM
RALM
VLM
91
0
0
17 Jan 2025
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Yilang Zhang
Bingcong Li
G. Giannakis
AAML
67
0
0
11 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
193
3
0
10 Jan 2025
Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation
N. Dennler
Stefanos Nikolaidis
Maja J. Matarić
466
0
0
03 Jan 2025
FAST: Fast Audio Spectrogram Transformer
Anugunj Naman
Gaibo Zhang
65
0
0
03 Jan 2025
Trainingless Adaptation of Pretrained Models for Environmental Sound Classification
Noriyuki Tonami
Wataru Kohno
Keisuke Imoto
Yoshiyuki Yajima
Sakiko Mishima
Reishi Kondo
Tomoyuki Hino
VLM
164
0
0
23 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
138
0
0
18 Dec 2024
When Vision Models Meet Parameter Efficient Look-Aside Adapters Without Large-Scale Audio Pretraining
Juan Yeo
Jinkwan Jang
Kyubyung Chae
Seongkyu Mun
Taesup Kim
VLM
142
0
0
08 Dec 2024
STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft
Nicholas Lenzen
Amogh Raut
Andrew Melnik
VGen
118
0
0
01 Dec 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
200
0
0
24 Nov 2024
State-Space Large Audio Language Models
Saurabhchand Bhati
Yuan Gong
Leonid Karlinsky
Hilde Kuehne
Rogerio Feris
James Glass
153
1
0
24 Nov 2024
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
H. Wang
71
1
0
14 Nov 2024
PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Fang Kang
Feiran Yang
Wenwu Wang
Mark D. Plumbley
J. Yang
72
1
0
10 Nov 2024
Model and Deep learning based Dynamic Range Compression Inversion
Haoran Sun
Dominique Fourer
Hichem Maaref
21
0
0
07 Nov 2024
Stepping Forward on the Last Mile
Chen Feng
Shaojie Zhuo
Xiaopeng Zhang
R. Ramakrishnan
Zhaocong Yuan
Andrew Zou Li
139
0
0
06 Nov 2024
Angular Distance Distribution Loss for Audio Classification
Antonio Almudévar
Romain Serizel
Alfonso Ortega
58
0
0
31 Oct 2024
EEG-based Multimodal Representation Learning for Emotion Recognition
Kang Yin
Hye-Bin Shin
Dan Li
Seong-Whan Lee
39
4
0
29 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
69
3
0
24 Oct 2024
Learning to rumble: Automated elephant call classification, detection and endpointing using deep architectures
Christiaan M. Geldenhuys
Thomas R. Niesler
34
0
0
15 Oct 2024
GraFPrint: A GNN-Based Approach for Audio Identification
Aditya Bhattacharjee
Shubhr Singh
Emmanouil Benetos
95
1
0
14 Oct 2024
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
68
3
0
12 Oct 2024
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Eileen Wang
Caren Han
Josiah Poon
74
0
0
12 Oct 2024
Movie Trailer Genre Classification Using Multimodal Pretrained Features
Serkan Sulun
Paula Viana
M. Davies
CLIP
74
3
0
11 Oct 2024
Music Genre Classification using Large Language Models
Mohamed El Amine Meguenani
Alceu de Souza Britto Jr.
A. L. Koerich
75
0
0
10 Oct 2024
Self-Attention Mechanism in Multimodal Context for Banking Transaction Flow
Cyrile Delestre
Yoann Sola
34
0
0
10 Oct 2024
Previous
1
2
3
4
5
...
8
9
10
Next