ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.11607
  4. Cited By
Transformers in Speech Processing: A Survey

Transformers in Speech Processing: A Survey

21 March 2023
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
ArXivPDFHTML

Papers citing "Transformers in Speech Processing: A Survey"

50 / 235 papers shown
Title
XGPT: Cross-modal Generative Pre-Training for Image Captioning
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
MLLM
VLM
67
75
0
03 Mar 2020
Controllable Time-Delay Transformer for Real-Time Punctuation Prediction
  and Disfluency Detection
Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection
Qian Chen
Mengzhe Chen
Bo Li
Wen Wang
80
35
0
03 Mar 2020
Towards Learning a Universal Non-Semantic Representation of Speech
Towards Learning a Universal Non-Semantic Representation of Speech
Joel Shor
A. Jansen
Ronnie Maor
Oran Lang
Omry Tuval
Félix de Chaumont Quitry
Marco Tagliasacchi
Ira Shavitt
Dotan Emanuel
Yinnon A. Haviv
SSL
120
156
0
25 Feb 2020
Transformer Transducer: A Streamable Speech Recognition Model with
  Transformer Encoders and RNN-T Loss
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Qian Zhang
Han Lu
Hasim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
69
480
0
07 Feb 2020
Learning Robust and Multilingual Speech Representations
Learning Robust and Multilingual Speech Representations
Kazuya Kawakami
Luyu Wang
Chris Dyer
Phil Blunsom
Aaron van den Oord
SSL
68
100
0
29 Jan 2020
Towards a Human-like Open-Domain Chatbot
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
91
935
0
27 Jan 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised
  Image-Text Data
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
76
261
0
22 Jan 2020
Reformer: The Efficient Transformer
Reformer: The Efficient Transformer
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
181
2,307
0
13 Jan 2020
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using
  Transformer with Text-to-Speech Pretraining
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining
Wen-Chin Huang
Tomoki Hayashi
Yi-Chiao Wu
Hirokazu Kameoka
Tomoki Toda
54
98
0
14 Dec 2019
Synchronous Transformers for End-to-End Speech Recognition
Synchronous Transformers for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
Ye Bai
J. Tao
Shuai Zhang
Zhengqi Wen
57
73
0
06 Dec 2019
Factorized Multimodal Transformer for Multimodal Sequential Learning
Factorized Multimodal Transformer for Multimodal Sequential Learning
Amir Zadeh
Chengfeng Mao
Kelly Shi
Yiwei Zhang
Paul Pu Liang
Soujanya Poria
Louis-Philippe Morency
52
44
0
22 Nov 2019
Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual
  Task-oriented Dialogue Systems
Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems
Zihan Liu
Genta Indra Winata
Zhaojiang Lin
Peng Xu
Pascale Fung
77
100
0
21 Nov 2019
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
73
197
0
09 Nov 2019
A Simplified Fully Quantized Transformer for End-to-end Speech
  Recognition
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Alex Bie
Bharat Venkitesh
João Monteiro
Md. Akmal Haidar
Mehdi Rezagholizadeh
MQ
50
27
0
09 Nov 2019
DialoGPT: Large-Scale Generative Pre-training for Conversational
  Response Generation
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
Yizhe Zhang
Siqi Sun
Michel Galley
Yen-Chun Chen
Chris Brockett
Xiang Gao
Jianfeng Gao
Jingjing Liu
W. Dolan
VLM
153
1,519
0
01 Nov 2019
Improving Generalization of Transformer for Speech Recognition with
  Parallel Schedule Sampling and Relative Positional Embedding
Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding
Pan Zhou
Ruchao Fan
Wei Chen
Jia Jia
41
26
0
01 Nov 2019
Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank
  Transformer
Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer
Genta Indra Winata
Samuel Cahyawijaya
Zhaojiang Lin
Zihan Liu
Pascale Fung
39
75
0
30 Oct 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
211
10,792
0
29 Oct 2019
Transformer-Transducer: End-to-End Speech Recognition with
  Self-Attention
Transformer-Transducer: End-to-End Speech Recognition with Self-Attention
Ching-Feng Yeh
Jay Mahadeokar
Kaustubh Kalgaonkar
Yongqiang Wang
Duc Le
Mahaveer Jain
Kjell Schubert
Christian Fuegen
M. Seltzer
69
150
0
28 Oct 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep
  Bidirectional Transformer Encoders
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
132
373
0
25 Oct 2019
Correction of Automatic Speech Recognition with Transformer
  Sequence-to-sequence Model
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii Hrinchuk
Mariya Popova
Boris Ginsburg
VLM
46
89
0
23 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
369
20,053
0
23 Oct 2019
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention
  Networks
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks
Xingcheng Song
Guangsen Wang
Zhiyong Wu
Yiheng Huang
Dan Su
Dong Yu
Helen Meng
SSL
62
49
0
23 Oct 2019
Transformer ASR with Contextual Block Processing
Transformer ASR with Contextual Block Processing
E. Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
78
65
0
16 Oct 2019
T-GSA: Transformer with Gaussian-weighted self-attention for speech
  enhancement
T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement
Jaeyoung Kim
Mostafa El-Khamy
Jungwon Lee
46
188
0
13 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
130
666
0
12 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
196
7,481
0
02 Oct 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
65
720
0
13 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
Chengzhu Yu
Heng Lu
Na Hu
Meng Yu
Chao Weng
...
Deyi Tuo
Shiyin Kang
Guangzhi Lei
Dan Su
Dong Yu
CVBM
48
118
0
04 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.0K
12,129
0
27 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
227
2,474
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
200
900
0
16 Aug 2019
Survey on Deep Neural Networks in Speech and Vision Systems
Survey on Deep Neural Networks in Speech and Vision Systems
M. Alam
Manar D. Samad
Lasitha Vidyaratne
Alexander M. Glandon
Khan M. Iftekharuddin
3DV
VLM
AI4TS
59
210
0
16 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
130
1,948
0
09 Aug 2019
Hello, It's GPT-2 -- How Can I Help You? Towards the Use of Pretrained
  Language Models for Task-Oriented Dialogue Systems
Hello, It's GPT-2 -- How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems
Paweł Budzianowski
Ivan Vulić
62
310
0
12 Jul 2019
Sharing Attention Weights for Fast Transformer
Sharing Attention Weights for Fast Transformer
Tong Xiao
Yinqiao Li
Jingbo Zhu
Zhengtao Yu
Tongran Liu
48
52
0
26 Jun 2019
A Tensorized Transformer for Language Modeling
A Tensorized Transformer for Language Modeling
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
55
167
0
24 Jun 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
215
8,415
0
19 Jun 2019
Lattice Transformer for Speech Translation
Lattice Transformer for Speech Translation
Pei Zhang
Boxing Chen
Niyu Ge
Kai Fan
55
50
0
13 Jun 2019
Learning Deep Transformer Models for Machine Translation
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
70
670
0
05 Jun 2019
Improving Long Distance Slot Carryover in Spoken Dialogue Systems
Improving Long Distance Slot Carryover in Spoken Dialogue Systems
Tongfei Chen
Chetan Naik
Hua He
Pushpendre Rastogi
Lambert Mathias
36
10
0
04 Jun 2019
Multimodal Transformer for Unaligned Multimodal Language Sequences
Multimodal Transformer for Unaligned Multimodal Language Sequences
Yao-Hung Hubert Tsai
Shaojie Bai
Paul Pu Liang
J. Zico Kolter
Louis-Philippe Morency
Ruslan Salakhutdinov
72
1,296
0
01 Jun 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
102
1,134
0
23 May 2019
Transformers with convolutional context for ASR
Transformers with convolutional context for ASR
Abdel-rahman Mohamed
Dmytro Okhonko
Luke Zettlemoyer
56
168
0
26 Apr 2019
Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
93
1,894
0
23 Apr 2019
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
159
3,451
0
18 Apr 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
69
1,243
0
03 Apr 2019
Deep Text-to-Speech System with Seq2Seq Model
Deep Text-to-Speech System with Seq2Seq Model
Gary Wang
AI4TS
8
9
0
11 Mar 2019
Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using
  Self-Attention Network and Chunk-Hopping
Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using Self-Attention Network and Chunk-Hopping
Linhao Dong
Feng Wang
Bo Xu
47
91
0
18 Feb 2019
TransferTransfo: A Transfer Learning Approach for Neural Network Based
  Conversational Agents
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents
Thomas Wolf
Victor Sanh
Julien Chaumond
Clement Delangue
69
494
0
23 Jan 2019
Previous
12345
Next