Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.00108
Cited By
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
28 September 2018
Stavros Petridis
Themos Stafylakis
Pingchuan Ma
Georgios Tzimiropoulos
Maja Pantic
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture"
32 / 32 papers shown
Title
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
195
0
0
06 May 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
118
1
0
03 Feb 2025
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
46
8
0
14 Mar 2024
Enhancing Multimodal Unified Representations for Cross Modal Generalization
Hai Huang
Yan Xia
Shengpeng Ji
Shulei Wang
Hanting Wang
Minghui Fang
Jieming Zhu
Zhenhua Dong
Sashuai Zhou
Zhou Zhao
37
6
0
08 Mar 2024
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish
David Gimeno-Gómez
Carlos David Martínez Hinarejos
31
0
0
21 Nov 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
38
1
0
31 Jul 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
39
5
0
18 Jun 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
32
10
0
21 Jan 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
45
49
0
12 Dec 2022
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Chen Chen
Yuchen Hu
Qiang Zhang
Heqing Zou
Beier Zhu
Eng Siong Chng
33
26
0
10 Dec 2022
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Pingchuan Ma
Niko Moritz
Stavros Petridis
Christian Fuegen
Maja Pantic
37
2
0
03 Nov 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
26
21
0
13 Jul 2022
Is Lip Region-of-Interest Sufficient for Lipreading?
Jing-Xuan Zhang
Genshun Wan
Jia Pan
24
6
0
28 May 2022
End-to-end multi-talker audio-visual ASR using an active speaker attention module
R. Rose
Olivier Siohan
18
3
0
01 Apr 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
130
145
0
26 Feb 2022
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
37
363
0
02 Nov 2021
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
19
13
0
29 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
17
92
0
14 Oct 2021
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
Maja Pantic
SSL
24
53
0
16 Jun 2021
Fusing information streams in end-to-end audio-visual speech recognition
Wentao Yu
Steffen Zeiler
D. Kolossa
81
12
0
19 Apr 2021
Lip-reading with Densely Connected Temporal Convolutional Networks
Pingchuan Ma
Yujiang Wang
Jie Shen
Stavros Petridis
Maja Pantic
22
56
0
29 Sep 2020
Seeing wake words: Audio-visual Keyword Spotting
Liliane Momeni
Triantafyllos Afouras
Themos Stafylakis
Samuel Albanie
Andrew Zisserman
46
43
0
02 Sep 2020
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition
L. Wei
Jie Zhang
Junfeng Hou
Lirong Dai
16
14
0
06 Aug 2020
FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire
Jinglin Liu
Yi Ren
Zhou Zhao
Chen Zhang
Baoxing Huai
Jing Yuan
17
11
0
06 Aug 2020
Towards Practical Lipreading with Distilled and Efficient Models
Pingchuan Ma
Brais Martínez
Stavros Petridis
Maja Pantic
26
95
0
13 Jul 2020
"Notic My Speech" -- Blending Speech Patterns With Multimedia
Dhruva Sahrawat
Yaman Kumar Singla
Shashwat Aggarwal
Yifang Yin
R. Shah
Roger Zimmermann
33
3
0
12 Jun 2020
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
George Sterpu
Christian Saam
N. Harte
34
28
0
17 Apr 2020
End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection
Takenori Yoshimura
Tomoki Hayashi
K. Takeda
Shinji Watanabe
37
49
0
03 Feb 2020
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Jianwei Yu
Shi-Xiong Zhang
Jian Wu
Shahram Ghorbani
Bo Wu
Shiyin Kang
Shansong Liu
Xunying Liu
Helen Meng
Dong Yu
32
72
0
06 Jan 2020
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Takaki Makino
H. Liao
Yannis Assael
Brendan Shillingford
Basi García
Otavio Braga
Olivier Siohan
29
129
0
08 Nov 2019
Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
Pingchuan Ma
Stavros Petridis
Maja Pantic
AuLLM
33
10
0
05 Jun 2019
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
1