Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.13154
Cited By
Attention is All You Need in Speech Separation
25 October 2020
Cem Subakan
Mirco Ravanelli
Samuele Cornell
Mirko Bronzi
Jianyuan Zhong
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention is All You Need in Speech Separation"
50 / 219 papers shown
Title
Listen to Extract: Onset-Prompted Target Speaker Extraction
Pengjie Shen
Kangrui Chen
Shulin He
Pengru Chen
Shuqi Yuan
He Kong
Xueliang Zhang
Z. Wang
48
0
0
08 May 2025
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLM
KELM
VLM
57
0
0
06 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
118
0
0
06 May 2025
MaskClip: Detachable Clip-on Piezoelectric Sensing of Mask Surface Vibrations for Real-time Noise-Robust Speech Input
Hirotaka Hiraki
Jun Rekimoto
19
0
0
04 May 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
52
1
0
28 Apr 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
31
0
0
25 Apr 2025
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
Beilong Tang
Bang Zeng
Ming Li
AI4TS
39
0
0
10 Apr 2025
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Haizhou Li
AI4TS
36
0
0
03 Apr 2025
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
Zhouhong Gu
Xingzhou Chen
Xiaoran Shi
Tao Wang
Suhang Zheng
Tianyu Li
Hongwei Feng
Yanghua Xiao
78
0
0
26 Mar 2025
Wireless Hearables With Programmable Speech AI Accelerators
Malek Itani
Tuochao Chen
Arun Raghavan
Gavriel Kohlberg
Shyamnath Gollakota
AuLLM
59
0
0
24 Mar 2025
HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks
Ekaterina Dmitrieva
Maksim Kaledin
42
0
0
21 Mar 2025
Shushing! Let's Imagine an Authentic Speech from the Silent Video
Jiaxin Ye
Hongming Shan
DiffM
VGen
71
1
0
19 Mar 2025
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
Wupeng Wang
Zexu Pan
Jingru Lin
Shuai Wang
Haizhou Li
53
0
0
16 Mar 2025
Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
Minsu Kim
Rodrigo Mira
Honglie Chen
Stavros Petridis
M. Pantic
64
0
0
13 Mar 2025
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
Weiguang Chen
Junjie Zhang
Jielong Yang
Eng Siong Chng
Xionghu Zhong
66
0
0
07 Mar 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
Boyi Kang
Xinfa Zhu
Zihan Zhang
Zhen Ye
Mingshuai Liu
...
Jun Chen
Longshuai Xiao
Chao Weng
Wei Xue
Lei Xie
AuLLM
55
3
0
01 Mar 2025
Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
Haoyang Li
J. Yip
Tianyu Fan
Eng Siong Chng
54
0
0
22 Feb 2025
AACessTalk: Fostering Communication between Minimally Verbal Autistic Children and Parents with Contextual Guidance and Card Recommendation
Dasom Choi
SoHyun Park
Kyungah Lee
Hwajung Hong
Y. Kim
51
0
0
17 Feb 2025
EDSep: An Effective Diffusion-Based Method for Speech Source Separation
Jinwei Dong
Xinsheng Wang
Qirong Mao
63
0
0
28 Jan 2025
30+ Years of Source Separation Research: Achievements and Future Challenges
S. Araki
N. Ito
Reinhold Haeb-Umbach
G. Wichern
Zhong-Qiu Wang
Yuki Mitsufuji
AI4TS
39
0
0
21 Jan 2025
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
David Perera
Victor Letzelter
Théo Mariotte
Adrien Cortés
Mickaël Chen
S. Essid
Ga¨el Richard
74
2
0
20 Jan 2025
Beyond Speaker Identity: Text Guided Target Speech Extraction
Mingyue Huo
Abhinav Jain
Cong Phuoc Huynh
Fanjie Kong
Pichao Wang
Zhu Liu
Vimal Bhat
51
0
0
17 Jan 2025
UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation
Xinyao Liao
Wei Wei
Dangyang Chen
Yuanyuan Fu
58
0
0
10 Jan 2025
Multiple Choice Learning for Efficient Speech Separation with Many Speakers
David Perera
François Derrida
Théo Mariotte
Gaël Richard
S. Essid
62
0
0
27 Nov 2024
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang
Yu-Kuan Fu
Chen An Li
Yi-Cheng Lin
Yu-Xiang Lin
...
Ulin Sanga
Xuanjun Chen
Po-Chun Hsu
Shu-Wen Yang
Hung-yi Lee
AuLLM
46
0
0
11 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Hao Li
34
4
0
05 Nov 2024
Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT
Pourya Jafarzadeh
Amir Mohammad Rostami
Padideh Choobdar
24
1
0
05 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Yunhong Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
35
10
0
02 Nov 2024
Task-Aware Unified Source Separation
Kohei Saijo
Janek Ebbers
François Germain
G. Wichern
Jonathan Le Roux
42
2
0
31 Oct 2024
USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis
Luca Jiang-Tao Yu
Running Zhao
Sijie Ji
Edith C. H. Ngai
Chenshu Wu
30
0
0
29 Oct 2024
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Xize Cheng
Siqi Zheng
Zehan Wang
Minghui Fang
Ziang Zhang
...
Z. Ma
Shengpeng Ji
Jialong Zuo
Tao Jin
Zhou Zhao
30
1
0
28 Oct 2024
SepMamba: State-space models for speaker separation using Mamba
Thor Højhus Avenstrup
Boldizsár Elek
István László Mádi
András Bence Schin
Morten Mørup
Bjørn Sand Jensen
Kenny Falkær Olsen
Mamba
31
0
0
28 Oct 2024
The importance of spatial and spectral information in multiple speaker tracking
H. Beit-On
V. Tourbabin
B. Rafaely
24
0
0
15 Oct 2024
Predictive Coding for Decision Transformer
Tung M. Luu
Donghoon Lee
Chang D. Yoo
OffRL
58
2
0
04 Oct 2024
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Kai Li
Wendi Sang
Chang Zeng
Runxuan Yang
Guo Chen
Xiaolin Hu
28
2
0
02 Oct 2024
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Mohan Xu
Kai Li
Guo Chen
Xiaolin Hu
43
0
0
02 Oct 2024
Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
Wenze Ren
Kuo-Hsuan Hung
Rong-Yu Chao
YouJin Li
Hsin-Min Wang
Yu Tsao
28
0
0
22 Sep 2024
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Kohei Saijo
Janek Ebbers
François Germain
Sameer Khurana
G. Wichern
Jonathan Le Roux
44
1
0
20 Sep 2024
A quest through interconnected datasets: lessons from highly-cited ICASSP papers
Cynthia C. S. Liem
Doğa Taşcılar
Andrew M. Demetriou
23
0
0
19 Sep 2024
Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition
Hongyu Zhu
Xin Jin
Hongchao Liao
Yan Xiang
M. El-Yacoubi
Huafeng Qin
42
1
0
18 Sep 2024
Language-Queried Target Sound Extraction Without Parallel Training Data
Hao Ma
Zhiyuan Peng
Xu Li
Yukai Li
Mingjie Shao
Qiuqiang Kong
Ju Liu
VLM
74
1
0
14 Sep 2024
LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
Eleonora Mancini
Francesco Paissan
Mirco Ravanelli
Cem Subakan
31
1
0
13 Sep 2024
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
Beilong Tang
Bang Zeng
Ming Li
35
2
0
12 Sep 2024
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
Bang Zeng
Ming Li
34
2
0
04 Sep 2024
Phantom: Untargeted Poisoning Attacks on Semi-Supervised Learning (Full Version)
Jonathan Knauer
Phillip Rieger
Hossein Fereidooni
A. Sadeghi
AAML
34
0
0
02 Sep 2024
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement
Tathagata Bandyopadhyay
ViT
15
0
0
02 Sep 2024
Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
K. Chen
Jiaqi Su
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Zeyu Jin
37
1
0
28 Aug 2024
PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images
Zifan Chen
Xinyu Nan
Jiazheng Li
Jie Zhao
Haifeng Li
...
Heyun Chen
Yiting Liu
Bin Dong
Li Lyna Zhang
L. Tang
MedIm
45
1
0
25 Aug 2024
WhisperMask: A Noise Suppressive Mask-Type Microphone for Whisper Speech
Hirotaka Hiraki
Shusuke Kanazawa
Takahiro Miura
Manabu Yoshida
Masaaki Mochimaru
Jun Rekimoto
29
4
0
22 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
35
0
0
08 Aug 2024
1
2
3
4
5
Next