ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.11607
  4. Cited By
Transformers in Speech Processing: A Survey

Transformers in Speech Processing: A Survey

21 March 2023
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
ArXivPDFHTML

Papers citing "Transformers in Speech Processing: A Survey"

50 / 235 papers shown
Title
Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment
Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment
Huma Ameer
Seemab Latif
Iram Tariq Bhatti
62
1
0
09 Jun 2024
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
88
4
0
18 Aug 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
90
183
0
07 Mar 2023
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic
  Speech Processing
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
48
40
0
27 Feb 2023
Response-act Guided Reinforced Dialogue Generation for Mental Health
  Counseling
Response-act Guided Reinforced Dialogue Generation for Mental Health Counseling
Aseem Srivastava
Ishan Pandey
Md. Shad Akhtar
Tanmoy Chakraborty
OffRL
48
13
0
30 Jan 2023
A Survey on Transformers in Reinforcement Learning
A Survey on Transformers in Reinforcement Learning
Wenzhe Li
Hao Luo
Zichuan Lin
Chongjie Zhang
Zongqing Lu
Deheng Ye
OffRL
MU
AI4CE
75
56
0
08 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
168
703
0
05 Jan 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
Audio-Visual Efficient Conformer for Robust Speech Recognition
Maxime Burchi
Radu Timofte
VLM
40
35
0
04 Jan 2023
Efficient Self-supervised Learning with Contextualized Target
  Representations for Vision, Speech and Language
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
67
96
0
14 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
130
3,623
0
06 Dec 2022
A Transformer-Based User Satisfaction Prediction for Proactive
  Interaction Mechanism in DuerOS
A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS
Wei Shen
Xiaonan He
Wei Shen
Xuyun Zhang
Jian Xie
38
3
0
05 Dec 2022
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
  Compression
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression
Jiaqi Gu
Ben Keller
Jean Kossaifi
Anima Anandkumar
Brucek Khailany
David Z. Pan
ViT
45
8
0
30 Nov 2022
Device Directedness with Contextual Cues for Spoken Dialog Systems
Device Directedness with Contextual Cues for Spoken Dialog Systems
Dhanush Bekal
S. Srinivasan
S. Bodapati
S. Ronanki
Katrin Kirchhoff
50
1
0
23 Nov 2022
Compressing Transformer-based self-supervised models for speech
  processing
Compressing Transformer-based self-supervised models for speech processing
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
50
6
0
17 Nov 2022
Cross-Attention is all you need: Real-Time Streaming Transformers for
  Personalised Speech Enhancement
Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement
Shucong Zhang
Malcolm Chadwick
Alberto Gil C. P. Ramos
S. Bhattacharya
36
5
0
08 Nov 2022
End-to-End Evaluation of a Spoken Dialogue System for Learning Basic
  Mathematics
End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics
Eda Okur
Saurav Sahay
Roddy Fuentes Alba
L. Nachman
43
6
0
07 Nov 2022
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic
  Speech Recognition
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Chendong Zhao
Jianzong Wang
Wentao Wei
Xiaoyang Qu
Haoqian Wang
Jing Xiao
52
2
0
30 Sep 2022
Vision Transformers for Action Recognition: A Survey
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Mian
ViT
32
45
0
13 Sep 2022
Transformers in Remote Sensing: A Survey
Transformers in Remote Sensing: A Survey
Abdulaziz Amer Aleissaee
Amandeep Kumar
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
Guisong Xia
Fahad Shahbaz Khan
ViT
68
185
0
02 Sep 2022
3D Vision with Transformers: A Survey
3D Vision with Transformers: A Survey
Jean Lahoud
Jiale Cao
Fahad Shahbaz Khan
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Ming-Hsuan Yang
ViT
MedIm
82
33
0
08 Aug 2022
Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech
  Separation
Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation
Jian Luo
Jianzong Wang
Ning Cheng
Edward Xiao
Xulong Zhang
Jing Xiao
ViT
48
12
0
28 Jun 2022
Conformer Based Elderly Speech Recognition System for Alzheimer's
  Disease Detection
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection
Tianzi Wang
Jiajun Deng
Mengzhe Geng
Zi Ye
Shoukang Hu
Yi Wang
Mingyu Cui
Zengrui Jin
Xunying Liu
Helen M. Meng
55
21
0
23 Jun 2022
GODEL: Large-Scale Pre-Training for Goal-Directed Dialog
GODEL: Large-Scale Pre-Training for Goal-Directed Dialog
Baolin Peng
Michel Galley
Pengcheng He
Chris Brockett
Lars Liden
E. Nouri
Zhou Yu
Bill Dolan
Jianfeng Gao
VLM
65
74
0
22 Jun 2022
Resource-Efficient Separation Transformer
Resource-Efficient Separation Transformer
Luca Della Libera
Cem Subakan
Mirco Ravanelli
Samuele Cornell
Frédéric Lepoutre
François Grondin
VLM
65
17
0
19 Jun 2022
Paraformer: Fast and Accurate Parallel Transformer for
  Non-autoregressive End-to-End Speech Recognition
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
36
105
0
16 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
Peng Xu
Xiatian Zhu
David Clifton
ViT
102
555
0
13 Jun 2022
Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue
  Systems
Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems
Ting-En Lin
Yuchuan Wu
Feiling Huang
Luo Si
Jian Sun
Yongbin Li
74
25
0
30 May 2022
Multiformer: A Head-Configurable Transformer-Based Model for Direct
  Speech Translation
Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
Gerard Sant
Gerard I. Gállego
Belen Alastruey
Marta R. Costa-jussá
47
4
0
14 May 2022
Ultra Fast Speech Separation Model with Teacher Student Learning
Ultra Fast Speech Separation Model with Teacher Student Learning
Sanyuan Chen
Yu-Huan Wu
Zhuo Chen
Jian Wu
Takuya Yoshioka
Shujie Liu
Jinyu Li
Xiangzhan Yu
47
14
0
27 Apr 2022
Gated Multimodal Fusion with Contrastive Learning for Turn-taking
  Prediction in Human-robot Dialogue
Gated Multimodal Fusion with Contrastive Learning for Turn-taking Prediction in Human-robot Dialogue
Jiudong Yang
Pei-Hsin Wang
Yi Zhu
Mingchao Feng
Meng Chen
Xiaodong He
27
16
0
18 Apr 2022
Towards End-to-End Integration of Dialog History for Improved Spoken
  Language Understanding
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding
Vishal Sunder
Samuel Thomas
H. Kuo
Jatin Ganhotra
Brian Kingsbury
Eric Fosler-Lussier
VLM
67
10
0
11 Apr 2022
Robust Speaker Recognition with Transformers Using wav2vec 2.0
Robust Speaker Recognition with Transformers Using wav2vec 2.0
Sergey Novoselov
G. Lavrentyeva
Anastasia Avdeeva
V. Volokhov
Aleksei Gusev
ViT
26
18
0
28 Mar 2022
Dawn of the transformer era in speech emotion recognition: closing the
  valence gap
Dawn of the transformer era in speech emotion recognition: closing the valence gap
Johannes Wagner
Andreas Triantafyllopoulos
H. Wierstorf
Maximilian Schmitt
Felix Burkhardt
F. Eyben
Björn W. Schuller
61
300
0
14 Mar 2022
SpeechFormer: A Hierarchical Efficient Framework Incorporating the
  Characteristics of Speech
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
54
34
0
08 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
745
12,835
0
04 Mar 2022
TRILLsson: Distilled Universal Paralinguistic Speech Representations
TRILLsson: Distilled Universal Paralinguistic Speech Representations
Joel Shor
Subhashini Venugopalan
52
40
0
01 Mar 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
89
852
0
07 Feb 2022
Transformers in Medical Imaging: A Survey
Transformers in Medical Imaging: A Survey
Fahad Shamshad
Salman Khan
Syed Waqas Zamir
Muhammad Haris Khan
Munawar Hayat
Fahad Shahbaz Khan
Huazhu Fu
ViT
LM&MA
MedIm
159
689
0
24 Jan 2022
A Comparative Study on Language Models for Task-Oriented Dialogue
  Systems
A Comparative Study on Language Models for Task-Oriented Dialogue Systems
Vinsen Marselino Andreas
Genta Indra Winata
Ayu Purwarianti
31
8
0
21 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
43
39
0
15 Jan 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
60
217
0
12 Jan 2022
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
Guosheng Lin
71
193
0
17 Dec 2021
Mixed Precision of Quantization of Transformer Language Models for
  Speech Recognition
Mixed Precision of Quantization of Transformer Language Models for Speech Recognition
Junhao Xu
Shoukang Hu
Jianwei Yu
Xunying Liu
Helen M. Meng
MQ
65
16
0
29 Nov 2021
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation,
  Speech Enhancement and Speech Separation
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Tom O'Malley
A. Narayanan
Quan Wang
Alex Park
James Walker
N. Howard
45
28
0
18 Nov 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at
  Scale
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
84
699
0
17 Nov 2021
A Survey of Visual Transformers
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
119
344
0
11 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
206
1,846
0
26 Oct 2021
Unifying Multimodal Transformer for Bi-directional Image and Text
  Generation
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang
Hongwei Xue
Bei Liu
Yutong Lu
52
58
0
19 Oct 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
  Processing
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
136
201
0
14 Oct 2021
Auxiliary Loss of Transformer with Residual Connection for End-to-End
  Speaker Diarization
Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization
Yechan Yu
Dongkeon Park
Hyeongju Kim
31
20
0
14 Oct 2021
12345
Next