ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.05069
  4. Cited By
Efficient Training of Audio Transformers with Patchout

Efficient Training of Audio Transformers with Patchout

11 October 2021
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
    ViT
ArXivPDFHTML

Papers citing "Efficient Training of Audio Transformers with Patchout"

50 / 58 papers shown
Title
Fast Text-to-Audio Generation with Adversarial Post-Training
Fast Text-to-Audio Generation with Adversarial Post-Training
Zachary Novack
Zach Evans
Zack Zukowski
Josiah Taylor
CJ Carr
...
Adnan Al-Sinan
Gian Marco Iodice
Julian McAuley
Taylor Berg-Kirkpatrick
Jordi Pons
30
0
0
13 May 2025
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Paul Primus
Florian Schmid
Gerhard Widmer
CLIP
AI4TS
VLM
36
0
0
12 May 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
58
0
0
21 Apr 2025
Myna: Masking-Based Contrastive Learning of Musical Representations
Myna: Masking-Based Contrastive Learning of Musical Representations
Ori Yonay
Tracy Hammond
Tianbao Yang
AAML
61
0
0
20 Feb 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks
Felipe Marra
Lucas N. Ferreira
31
0
0
06 Nov 2024
Generalization in birdsong classification: impact of transfer learning
  methods and dataset characteristics
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
Burooj Ghani
Vincent J. Kalkman
Bob Planqué
Willem-Pier Vellinga
L. Gill
Dan Stowell
VLM
32
5
0
21 Sep 2024
Data Efficient Acoustic Scene Classification using Teacher-Informed
  Confusing Class Instruction
Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction
Jin Jie Sean Yeo
Ee-Leng Tan
Jisheng Bai
Santi Peksi
Woon-Seng Gan
30
1
0
18 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
33
2
0
14 Sep 2024
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
48
2
0
29 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
44
0
0
11 Aug 2024
Improving Audio Spectrogram Transformers for Sound Event Detection
  Through Multi-Stage Training
Improving Audio Spectrogram Transformers for Sound Event Detection Through Multi-Stage Training
Florian Schmid
Paul Primus
T. Morocutti
Jonathan Greif
Gerhard Widmer
32
5
0
17 Jul 2024
Fusing Audio and Metadata Embeddings Improves Language-based Audio
  Retrieval
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval
Paul Primus
Gerhard Widmer
52
3
0
22 Jun 2024
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging
  and Cross-Model Knowledge Distillation
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation
Swarup Ranjan Behera
Abhishek Dhiman
Karthik Gowda
Aalekhya Satya Narayani
26
1
0
11 Jun 2024
Audio-based Step-count Estimation for Running -- Windowing and Neural
  Network Baselines
Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines
Philipp Wagner
Andreas Triantafyllopoulos
Alexander Gebhard
Björn Schuller
37
0
0
10 Jun 2024
Embedding Compression for Teacher-to-Student Knowledge Transfer
Embedding Compression for Teacher-to-Student Knowledge Transfer
Yiwei Ding
Alexander Lerch
26
1
0
09 Feb 2024
On the choice of the optimal temporal support for audio classification
  with Pre-trained embeddings
On the choice of the optimal temporal support for audio classification with Pre-trained embeddings
Aurian Quélennec
Michel Olvera
Geoffroy Peeters
S. Essid
33
2
0
21 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
25
17
0
27 Nov 2023
CLARA: Multilingual Contrastive Learning for Audio Representation
  Acquisition
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition
K. A. Noriy
Xiaosong Yang
Marcin Budka
Jian Jun Zhang
VLM
26
3
0
18 Oct 2023
Efficient Supervised Training of Audio Transformers for Music
  Representation Learning
Efficient Supervised Training of Audio Transformers for Music Representation Learning
Pablo Alonso-Jiménez
Xavier Serra
Dmitry Bogdanov
ViT
35
3
0
28 Sep 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio
  Tagging by Aligning with Label Text Description
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
34
0
0
28 Sep 2023
Audio Contrastive based Fine-tuning
Audio Contrastive based Fine-tuning
Yang Wang
Qibin Liang
Chenghao Xiao
Yizhi Li
Noura Al Moubayed
Chenghua Lin
32
0
0
21 Sep 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
40
1
0
14 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
30
223
0
10 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large
  Audio-Caption Data Sets
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
24
13
0
08 Aug 2023
Improving Domain Generalization for Sound Classification with Sparse
  Frequency-Regularized Transformer
Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer
Honglin Mu
Wentian Xia
Wanxiang Che
22
1
0
19 Jul 2023
Streaming Audio Transformers for Online Audio Tagging
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
34
4
0
29 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
115
0
18 May 2023
Transformer-based Sequence Labeling for Audio Classification based on MFCCs
C. Sonali
S. ChinmayiB
A. Balasubramanian
32
0
0
30 Apr 2023
MMViT: Multiscale Multiview Vision Transformers
MMViT: Multiscale Multiview Vision Transformers
Yuchen Liu
Natasha Ong
Kaiyan Peng
Bo Xiong
Qifan Wang
...
Madian Khabsa
Kaiyue Yang
David C. Liu
Donald Williamson
Hanchao Yu
ViT
33
4
0
28 Apr 2023
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner
Benedikt Alkin
Andreas Fürst
Elisabeth Rumetshofer
Lukas Miklautz
Sepp Hochreiter
26
18
0
20 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
25
2
0
12 Apr 2023
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Thodoris Kouzelis
Grigoris Bastas
Athanasios Katsamanis
Alexandros Potamianos
ViT
18
6
0
06 Apr 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
71
14
0
14 Mar 2023
CAT: Causal Audio Transformer for Audio Classification
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu
Hanlin Lu
Jianbo Yuan
Xinyu Li
ViT
26
22
0
14 Mar 2023
Approach to Learning Generalized Audio Representation Through Batch
  Embedding Covariance Regularization and Constant-Q Transforms
Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms
Ankit Parag Shah
Shuyi Chen
Kejun Zhou
Yue Chen
Bhiksha Raj
18
1
0
07 Mar 2023
Low-Complexity Audio Embedding Extractors
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
24
4
0
03 Mar 2023
An Attention-based Approach to Hierarchical Multi-label Music Instrument
  Classification
An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
Zhi-Wei Zhong
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Shusuke Takahashi
Yuki Mitsufuji
18
12
0
16 Feb 2023
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
K. Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
39
483
0
12 Nov 2022
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge
  Distillation
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Florian Schmid
Khaled Koutini
Gerhard Widmer
ViT
25
58
0
09 Nov 2022
I Hear Your True Colors: Image Guided Audio Generation
I Hear Your True Colors: Image Guided Audio Generation
Roy Sheffer
Yossi Adi
VLM
18
73
0
06 Nov 2022
Play It Back: Iterative Attention for Audio Recognition
Play It Back: Iterative Attention for Audio Recognition
Alexandros Stergiou
Dima Damen
37
4
0
20 Oct 2022
Learning Temporal Resolution in Spectrogram for Audio Classification
Learning Temporal Resolution in Spectrogram for Audio Classification
Haohe Liu
Xubo Liu
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
34
7
0
04 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
35
120
0
02 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
27
289
0
30 Sep 2022
The Efficacy of Self-Supervised Speech Models for Audio Representations
The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu
Chen An Li
Tzu-Han Lin
Tsung-Yuan Hsu
Hung-yi Lee
29
5
0
26 Sep 2022
UniKW-AT: Unified Keyword Spotting and Audio Tagging
UniKW-AT: Unified Keyword Spotting and Audio Tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
45
3
0
23 Sep 2022
Language-based Audio Retrieval Task in DCASE 2022 Challenge
Huang Xie
Samuel Lipping
Tuomas Virtanen
65
18
0
20 Sep 2022
Improved Zero-Shot Audio Tagging & Classification with Patchout
  Spectrogram Transformers
Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers
Paul Primus
Gerhard Widmer
VLM
19
5
0
24 Aug 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
21
268
0
13 Jul 2022
12
Next