ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.00496
  4. Cited By
LRS3-TED: a large-scale dataset for visual speech recognition

LRS3-TED: a large-scale dataset for visual speech recognition

3 September 2018
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
ArXivPDFHTML

Papers citing "LRS3-TED: a large-scale dataset for visual speech recognition"

50 / 110 papers shown
Title
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
Danilo de Oliveira
Julius Richter
Tal Peer
Timo Germann
DiffM
22
0
0
16 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
54
0
0
07 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
195
0
0
06 May 2025
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis
Radek Daněček
Carolin Schmitt
Senya Polikovsky
Michael J. Black
38
0
0
18 Apr 2025
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
Yunfei Liu
Lei Zhu
Lijian Lin
Ye Zhu
Ailing Zhang
Yu Li
52
1
0
16 Feb 2025
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Christopher Simic
Korbinian Riedhammer
Tobias Bocklet
95
0
0
03 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
118
1
0
03 Feb 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
79
1
0
23 Jan 2025
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Zhaofeng Lin
Naomi Harte
91
1
0
20 Jan 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
61
3
0
03 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
45
28
0
02 Jan 2025
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
Ruohong Yang
Peng Hu
Xi Peng
Xiting Liu
Yunfan Li
39
0
0
25 Dec 2024
DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer
  Scaling Factor Search
DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
Lei Yang
Shaoyang Xu
Deyi Xiong
39
1
0
25 Dec 2024
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
61
2
0
15 Nov 2024
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Jean-Eudes Ayilo
Mostafa Sadeghi
Romain Serizel
Xavier Alameda-Pineda
DiffM
30
0
0
04 Oct 2024
Measuring Sound Symbolism in Audio-visual Models
Measuring Sound Symbolism in Audio-visual Models
Wei-Cheng Tseng
Yi-Jen Shih
David Harwath
Raymond Mooney
39
0
0
18 Sep 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
36
9
0
18 Sep 2024
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
Xinyu Wang
Qian Wang
Haolin Huang
Yu Fang
Mengjie Xu
Qian Wang
36
0
0
31 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee
Yoori Oh
Injune Hwang
Kyogu Lee
CVBM
29
2
0
19 Aug 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
96
2
0
09 Jul 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech
  Separation and Recognition
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Guinan Li
Jiajun Deng
Youjun Chen
Mengzhe Geng
Shujie Hu
...
Zengrui Jin
Tianzi Wang
Xurong Xie
Helen Meng
Xunying Liu
VLM
34
0
0
14 Jun 2024
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional
  Flow Matching
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
47
4
0
13 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
44
3
0
09 Jun 2024
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Davide Berghi
Philip J. B. Jackson
50
0
0
01 Jun 2024
SMIRK: 3D Facial Expressions through Analysis-by-Neural-Synthesis
SMIRK: 3D Facial Expressions through Analysis-by-Neural-Synthesis
George Retsinas
P. Filntisis
Radek Daněček
Victoria Fernandez-Abrevaya
A. Roussos
Timo Bolkart
Petros Maragos
3DH
67
30
0
05 Apr 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
40
2
0
28 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
  Conformer
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
46
8
0
14 Mar 2024
It's Never Too Late: Fusing Acoustic Information into Large Language
  Models for Automatic Speech Recognition
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
38
20
0
08 Feb 2024
Synchformer: Efficient Synchronization from Sparse Cues
Synchformer: Efficient Synchronization from Sparse Cues
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
24
11
0
29 Jan 2024
Lips Are Lying: Spotting the Temporal Inconsistency between Audio and
  Visual in Lip-Syncing DeepFakes
Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes
Weifeng Liu
Tianyi She
Jiawei Liu
Run Wang
Dongyu Yao
Ziyou Liang
49
6
0
28 Jan 2024
Do VSR Models Generalize Beyond LRS3?
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
35
4
0
23 Nov 2023
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous
  Spanish
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish
David Gimeno-Gómez
Carlos David Martínez Hinarejos
31
0
0
21 Nov 2023
Seeing Through the Conversation: Audio-Visual Speech Separation based on
  Diffusion Model
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
Suyeon Lee
Chaeyoung Jung
Youngjoon Jang
Jaehun Kim
Joon Son Chung
35
7
0
30 Oct 2023
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
  Multiple Experts for Video Deepfake Detection
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
ViT
53
6
0
19 Oct 2023
Deep learning-based denoising streamed from mobile phones improves
  speech-in-noise understanding for hearing aid users
Deep learning-based denoising streamed from mobile phones improves speech-in-noise understanding for hearing aid users
P. U. Diehl
Hannes Zilly
Felix Sattler
Y. Singer
Kevin Kepp
...
Paul Meyer-Rachner
A. Pudszuhn
V. Hofmann
M. Vormann
Elias Sprengel
37
3
0
22 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
34
16
0
18 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
38
1
0
31 Jul 2023
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker
  Extraction
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Jiuxin Lin
X. Cai
Heinrich Dinkel
Jun Chen
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Zhiyong Wu
Yujun Wang
Helen M. Meng
24
21
0
25 Jun 2023
Audio-Driven 3D Facial Animation from In-the-Wild Videos
Audio-Driven 3D Facial Animation from In-the-Wild Videos
Liying Lu
Tianke Zhang
Yunfei Liu
Xuangeng Chu
Yu Li
VGen
50
3
0
20 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for
  Robust Audio-Visual Speech Recognition
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
39
5
0
18 Jun 2023
Emotional Speech-Driven Animation with Content-Emotion Disentanglement
Emotional Speech-Driven Animation with Content-Emotion Disentanglement
Radek Danvevcek
Kiran Chhatre
Shashank Tripathi
Yandong Wen
Michael J. Black
Timo Bolkart
18
67
0
15 Jun 2023
Automated Speaker Independent Visual Speech Recognition: A Comprehensive
  Survey
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey
Praneeth Nemani
G. S. Krishna
Kundrapu Supriya
BDL
34
3
0
14 Jun 2023
Intelligible Lip-to-Speech Synthesis with Speech Units
Intelligible Lip-to-Speech Synthesis with Speech Units
J. Choi
Minsu Kim
Y. Ro
32
24
0
31 May 2023
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Lin Li
...
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
78
8
0
24 May 2023
Improving the Gap in Visual Speech Recognition Between Normal and Silent
  Speech Based on Metric Learning
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning
Sara Kashiwagi
Keitaro Tanaka
Qi Feng
Shigeo Morishima
22
2
0
23 May 2023
Identity-Preserving Talking Face Generation with Landmark and Appearance
  Priors
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors
Wei‐Tao Zhong
Chaowei Fang
Yinqi Cai
Pengxu Wei
Gangming Zhao
Liang Lin
Guanbin Li
23
75
0
15 May 2023
Word-level Persian Lipreading Dataset
Word-level Persian Lipreading Dataset
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
33
5
0
08 Apr 2023
A Unified Compression Framework for Efficient Speech-Driven Talking-Face
  Generation
A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation
Bo-Kyeong Kim
Jaemin Kang
Daeun Seo
Hancheol Park
Shinkook Choi
Hyoung-Kyu Song
Hyungshin Kim
Sungsu Lim
29
0
0
02 Apr 2023
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic
  Supervision
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu
Egor Lakomkin
Konstantinos Vougioukas
Pingchuan Ma
Honglie Chen
...
Niko Moritz
J. Kolár
Stavros Petridis
Maja Pantic
Christian Fuegen
52
19
0
30 Mar 2023
ModEFormer: Modality-Preserving Embedding for Audio-Video
  Synchronization using Transformers
ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Akash Gupta
Rohun Tripathi
Won-Kap Jang
29
6
0
21 Mar 2023
123
Next