ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.00496
  4. Cited By
LRS3-TED: a large-scale dataset for visual speech recognition

LRS3-TED: a large-scale dataset for visual speech recognition

3 September 2018
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
ArXivPDFHTML

Papers citing "LRS3-TED: a large-scale dataset for visual speech recognition"

50 / 110 papers shown
Title
WASD: A Wilder Active Speaker Detection Dataset
WASD: A Wilder Active Speaker Detection Dataset
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
24
3
0
09 Mar 2023
Cross-modal Audio-visual Co-learning for Text-independent Speaker
  Verification
Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Meng Liu
Kong Aik Lee
Longbiao Wang
Hanyi Zhang
Chang Zeng
J. Dang
23
10
0
22 Feb 2023
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face
  Synthesis
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis
Zhenhui Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Jinzheng He
Zhou Zhao
CVBM
39
123
0
31 Jan 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech
  Recognition: the Arman-AV Dataset
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
32
10
0
21 Jan 2023
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
J. Park
Jung-Wook Hwang
Kwanghee Choi
Seung-Hyun Lee
Jun-Hwan Ahn
R.-H. Park
Hyung-Min Park
29
3
0
16 Jan 2023
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
  Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu
Tal Remez
Bowen Shi
Jacob Donley
Yossi Adi
DiffM
27
12
0
21 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
45
49
0
12 Dec 2022
Leveraging Modality-specific Representations for Audio-visual Speech
  Recognition via Reinforcement Learning
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Chen Chen
Yuchen Hu
Qiang Zhang
Heqing Zou
Beier Zhu
Eng Siong Chng
33
26
0
10 Dec 2022
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Pingchuan Ma
Niko Moritz
Stavros Petridis
Christian Fuegen
Maja Pantic
37
2
0
03 Nov 2022
SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from
  Video
SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video
Marija Jegorova
Stavros Petridis
Maja Pantic
29
2
0
20 Oct 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from
  Videos
Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos
P. Filntisis
George Retsinas
Foivos Paraperas-Papantoniou
Athanasios Katsamanis
A. Roussos
Petros Maragos
3DH
26
29
0
22 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
  to Unlabeled Modality
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSL
VLM
29
42
0
14 Jul 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End
  Audio-Visual Speech Recognition
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
26
21
0
13 Jul 2022
Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs
Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs
Bo-Kyeong Kim
Shinkook Choi
Hancheol Park
21
4
0
29 Jun 2022
Learning Speaker-specific Lip-to-Speech Generation
Learning Speaker-specific Lip-to-Speech Generation
Munender Varshney
Ravindra Yadav
Vinay P. Namboodiri
R. Hegde
29
7
0
04 Jun 2022
Is Lip Region-of-Interest Sufficient for Lipreading?
Is Lip Region-of-Interest Sufficient for Lipreading?
Jing-Xuan Zhang
Genshun Wan
Jia Pan
24
6
0
28 May 2022
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active
  Speaker Selection
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Otavio Braga
Olivier Siohan
27
7
0
11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech
  Recognition and Active Speaker Detection
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Otavio Braga
Olivier Siohan
CVBM
35
8
0
10 May 2022
Scaling up sign spotting through sign language dictionaries
Scaling up sign spotting through sign language dictionaries
Gül Varol
Liliane Momeni
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
29
14
0
09 May 2022
Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation
  in the Wild
Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild
Gang Wang
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
CVBM
27
14
0
08 Mar 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
130
145
0
26 Feb 2022
Learning Contextually Fused Audio-visual Representations for
  Audio-visual Speech Recognition
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
41
10
0
15 Feb 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
  for Single and Multi-Person Video
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
96
40
0
25 Jan 2022
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command
  Recognition
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Wenliang Dai
Samuel Cahyawijaya
Tiezheng Yu
Elham J. Barezi
Peng Xu
...
Genta Indra Winata
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
41
11
0
11 Jan 2022
Robust Self-Supervised Audio-Visual Speech Recognition
Robust Self-Supervised Audio-Visual Speech Recognition
Bowen Shi
Wei-Ning Hsu
Abdel-rahman Mohamed
39
90
0
05 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster
  Prediction
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
55
306
0
05 Jan 2022
Responsive Listening Head Generation: A Benchmark Dataset and Baseline
Responsive Listening Head Generation: A Benchmark Dataset and Baseline
Mohan Zhou
Yalong Bai
Wei Zhang
Ting Yao
Tiejun Zhao
Tao Mei
EGVM
30
45
0
27 Dec 2021
Deep Spoken Keyword Spotting: An Overview
Deep Spoken Keyword Spotting: An Overview
Iván López-Espejo
Zheng-Hua Tan
John H. L. Hansen
Jesper Jensen
21
102
0
20 Nov 2021
Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face
  Synthesis
Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis
Haozhe Wu
Jia Jia
Haoyu Wang
Yishun Dou
Chao Duan
Qingshan Deng
CVBM
11
73
0
30 Oct 2021
Visual Keyword Spotting with Attention
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
19
13
0
29 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
17
92
0
14 Oct 2021
The VVAD-LRS3 Dataset for Visual Voice Activity Detection
The VVAD-LRS3 Dataset for Visual Voice Activity Detection
Adrian Lubitz
Matias Valdenegro-Toro
Frank Kirchner
26
3
0
28 Sep 2021
The Right to Talk: An Audio-Visual Transformer Approach
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
63
36
0
06 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for
  Audio-visual Active Speaker Detection
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
22
176
0
14 Jul 2021
LiRA: Learning Visual Speech Representations from Audio through
  Self-supervision
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
Maja Pantic
SSL
24
53
0
16 Jun 2021
Fusing information streams in end-to-end audio-visual speech recognition
Fusing information streams in end-to-end audio-visual speech recognition
Wentao Yu
Steffen Zeiler
D. Kolossa
81
12
0
19 Apr 2021
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
Meng Liu
Longbiao Wang
Kong Aik Lee
Hanyi Zhang
Chang Zeng
J. Dang
HAI
30
12
0
17 Apr 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
Ryo Masumura
30
8
0
02 Mar 2021
Visual Speech Enhancement Without A Real Visual Stream
Visual Speech Enhancement Without A Real Visual Stream
Sindhu B. Hegde
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
DiffM
20
17
0
20 Dec 2020
TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging,
  audio, and lip videos
TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos
M. Ribeiro
Jennifer Sanger
Jingxuan Zhang
Aciel Eshky
A. Wrench
Korin Richmond
Steve Renals
LM&MA
24
33
0
19 Nov 2020
Video Generative Adversarial Networks: A Review
Video Generative Adversarial Networks: A Review
Nuha Aldausari
Arcot Sowmya
Nadine Marcus
Gelareh Mohammadi
EGVM
21
103
0
04 Nov 2020
Watch, read and lookup: learning to spot signs from multiple supervisors
Watch, read and lookup: learning to spot signs from multiple supervisors
Liliane Momeni
Gül Varol
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
26
43
0
08 Oct 2020
Seeing wake words: Audio-visual Keyword Spotting
Seeing wake words: Audio-visual Keyword Spotting
Liliane Momeni
Triantafyllos Afouras
Themos Stafylakis
Samuel Albanie
Andrew Zisserman
46
43
0
02 Sep 2020
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The
  Wild
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
EGVM
52
759
0
23 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
253
0
10 Aug 2020
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based
  Robust Speech Recognition
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition
L. Wei
Jie Zhang
Junfeng Hou
Lirong Dai
16
14
0
06 Aug 2020
"Notic My Speech" -- Blending Speech Patterns With Multimedia
"Notic My Speech" -- Blending Speech Patterns With Multimedia
Dhruva Sahrawat
Yaman Kumar Singla
Shashwat Aggarwal
Yifang Yin
R. Shah
Roger Zimmermann
33
3
0
12 Jun 2020
Multimodal Target Speech Separation with Voice and Face References
Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu
C. Weber
S. Wermter
CVBM
19
19
0
17 May 2020
Previous
123
Next