ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.05358
  4. Cited By
Lip Reading Sentences in the Wild

Lip Reading Sentences in the Wild

16 November 2016
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
ArXivPDFHTML

Papers citing "Lip Reading Sentences in the Wild"

50 / 340 papers shown
Title
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech
  Recognition
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition
Hao Wang
Shuhei Kurita
Shuichiro Shimizu
Daisuke Kawahara
13
3
0
18 Jan 2024
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech
  Representations of Unlabeled Data
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Hendrik Laux
Emil Mededovic
Ahmed Hallawa
Lukas Martin
A. Peine
Anke Schmeink
VLM
26
4
0
15 Dec 2023
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model
  for Audio-Visual Speech Recognition
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Fan Yu
Haoxu Wang
Ziyang Ma
Shiliang Zhang
57
2
0
14 Dec 2023
On Robustness to Missing Video for Audiovisual Speech Recognition
On Robustness to Missing Video for Audiovisual Speech Recognition
Oscar Chang
Otavio Braga
H. Liao
Dmitriy Serdyuk
Olivier Siohan
40
11
0
13 Dec 2023
SingingHead: A Large-scale 4D Dataset for Singing Head Animation
SingingHead: A Large-scale 4D Dataset for Singing Head Animation
Sijing Wu
Yunhao Li
Weitian Zhang
Jun Jia
Yucheng Zhu
Yichao Yan
Guangtao Zhai
Xiaokang Yang
49
2
0
07 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
37
12
0
05 Dec 2023
Do VSR Models Generalize Beyond LRS3?
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
35
4
0
23 Nov 2023
Analysis of Visual Features for Continuous Lipreading in Spanish
Analysis of Visual Features for Continuous Lipreading in Spanish
David Gimeno-Gómez
Carlos David Martínez Hinarejos
45
2
0
21 Nov 2023
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
David Gimeno-Gómez
Carlos David Martínez Hinarejos
19
8
0
21 Nov 2023
LaughTalk: Expressive 3D Talking Head Generation with Laughter
LaughTalk: Expressive 3D Talking Head Generation with Laughter
Kim Sung-Bin
Lee Hyun
Da Hye Hong
Suekyeong Nam
Janghoon Ju
Tae-Hyun Oh
28
21
0
02 Nov 2023
Learning Separable Hidden Unit Contributions for Speaker-Adaptive
  Lip-Reading
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
Songtao Luo
Shuang Yang
Shiguang Shan
Xilin Chen
35
1
0
08 Oct 2023
End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation
  and Lateral Inhibition
End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition
Emilian-Claudiu Muanescu
Ruazvan-Alexandru Smuadu
Andrei-Marius Avram
Dumitru-Clementin Cercel
Florin-Catalin Pop
38
0
0
07 Oct 2023
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual
  Speech Separation
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation
Samuel Pegg
Kai Li
Xiaolin Hu
32
4
0
29 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with
  Large Language Models
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
E. Chng
32
42
0
27 Sep 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual
  Representation Models
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
55
14
0
19 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
34
12
0
15 Sep 2023
HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for
  Arbitrary Talking Face Generation Methods
HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods
Yongyuan Li
Xiuyuan Qin
Chao Liang
Mingqiang Wei
27
3
0
14 Sep 2023
Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a
  Short Video
Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
Xiuzhe Wu
Pengfei Hu
Yang Wu
Xiaoyang Lyu
Yan-Pei Cao
Ying Shan
Wenming Yang
Zhongqian Sun
Xiaojuan Qi
23
14
0
09 Sep 2023
ReliTalk: Relightable Talking Portrait Generation from a Single Video
ReliTalk: Relightable Talking Portrait Generation from a Single Video
Haonan Qiu
Zhaoxi Chen
Yuming Jiang
Hang Zhou
Xiangyu Fan
Lei Yang
Wayne Wu
Ziwei Liu
DiffM
VGen
34
10
0
05 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent
  Videos
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
32
5
0
29 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Xiaozhong Liu
78
31
0
27 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
34
16
0
18 Aug 2023
A Survey on Deep Multi-modal Learning for Body Language Recognition and
  Generation
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Li Liu
Lufei Gao
Wen-Ling Lei
Fengji Ma
Xiaotian Lin
Jin-Tao Wang
CVBM
27
5
0
17 Aug 2023
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided
  Speaker Embedding
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
J. Choi
Joanna Hong
Y. Ro
DiffM
29
19
0
15 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by
  Compressing Audio Knowledge of a Pretrained Model
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
26
18
0
15 Aug 2023
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation
  Based Visual Pre-training and Cross-Modal Fusion Encoder
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Yusheng Dai
Hang Chen
Jun Du
xiao-ying Ding
Ning Ding
Feijun Jiang
Chin-Hui Lee
24
6
0
14 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
40
10
0
03 Aug 2023
A Unified Framework for Modality-Agnostic Deepfakes Detection
A Unified Framework for Modality-Agnostic Deepfakes Detection
Cai Yu
Peng-Wen Chen
Jiahe Tian
Jin Liu
Jiao Dai
Xi Wang
Yesheng Chai
Shan Jia
Siwei Lyu
Jizhong Han
32
0
0
26 Jul 2023
Leveraging Visemes for Better Visual Speech Representation and Lip
  Reading
Leveraging Visemes for Better Visual Speech Representation and Lip Reading
J. Peymanfard
Vahid Saeedi
Mohammad Reza Mohammadi
Hossein Zeinali
N. Mozayani
39
2
0
19 Jul 2023
MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions
MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions
Yunfei Liu
Lijian Lin
Fei Yu
Changyin Zhou
Yu Li
DiffM
VGen
42
23
0
19 Jul 2023
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
A. Haliassos
Stavros Petridis
M. Pantic
VLM
33
7
0
10 Jul 2023
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation
  and Recognition
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Guinan Li
Jiajun Deng
Mengzhe Geng
Zengrui Jin
Tianzi Wang
Shujie Hu
Mingyu Cui
Helen M. Meng
Xunying Liu
37
10
0
06 Jul 2023
Audio-Driven 3D Facial Animation from In-the-Wild Videos
Audio-Driven 3D Facial Animation from In-the-Wild Videos
Liying Lu
Tianke Zhang
Yunfei Liu
Xuangeng Chu
Yu Li
VGen
50
3
0
20 Jun 2023
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend
  3D Talking Faces
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces
Ziqiao Peng
Yihao Luo
Yue Shi
Hao-Xuan Xu
Xiangyu Zhu
Jun He
Hongyan Liu
Zhaoxin Fan
55
40
0
19 Jun 2023
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with
  Adversarial Network for Audio-Visual Speech Recognition
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Yuchen Hu
Chen Chen
Ruizhe Li
Heqing Zou
Chng Eng Siong
GAN
42
9
0
18 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for
  Robust Audio-Visual Speech Recognition
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
E. Chng
36
5
0
18 Jun 2023
Automated Speaker Independent Visual Speech Recognition: A Comprehensive
  Survey
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey
Praneeth Nemani
G. S. Krishna
Kundrapu Supriya
BDL
32
3
0
14 Jun 2023
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality
  Alignment
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Xize Cheng
Tao Jin
Lin Li
Wang Lin
Xinyu Duan
Zhou Zhao
VLM
21
15
0
10 Jun 2023
Looking and Listening: Audio Guided Text Recognition
Looking and Listening: Audio Guided Text Recognition
Wenwen Yu
Mingyu Liu
Biao Yang
Enming Zhang
Deqiang Jiang
Xing Sun
Yuliang Liu
Xiang Bai
DiffM
27
1
0
06 Jun 2023
Intelligible Lip-to-Speech Synthesis with Speech Units
Intelligible Lip-to-Speech Synthesis with Speech Units
J. Choi
Minsu Kim
Y. Ro
32
24
0
31 May 2023
A Neural State-Space Model Approach to Efficient Speech Separation
A Neural State-Space Model Approach to Efficient Speech Separation
Chen Chen
Chao-Han Huck Yang
Kai Li
Yuchen Hu
Pin-Jui Ku
Chng Eng Siong
34
11
0
26 May 2023
Improving the Gap in Visual Speech Recognition Between Normal and Silent
  Speech Based on Metric Learning
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning
Sara Kashiwagi
Keitaro Tanaka
Qi Feng
Shigeo Morishima
17
2
0
23 May 2023
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards
  High-fidelity Head Avatars
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
Dongwei Pan
Long Zhuo
Jingtan Piao
Huiwen Luo
Wei Cheng
...
Chen Change Loy
Chao Qian
Wayne Wu
Dahua Lin
Kwan-Yee Lin
27
19
0
22 May 2023
Cross-Modal Global Interaction and Local Alignment for Audio-Visual
  Speech Recognition
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Chen Chen
Heqing Zou
Qiu-shi Zhu
E. Chng
34
7
0
16 May 2023
Deep Learning Based Multimodal with Two-phase Training Strategy for
  Daily Life Video Classification
Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video Classification
L. D. Pham
T. Le
Cam Le
Dat Ngo
Axel Weissenfeld
Alexander Schindler
29
3
0
30 Apr 2023
Deep Learning-based Spatio Temporal Facial Feature Visual Speech
  Recognition
Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition
Pangoth Santhosh Kumar
Garika Akshay
17
2
0
30 Apr 2023
Word-level Persian Lipreading Dataset
Word-level Persian Lipreading Dataset
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
30
5
0
08 Apr 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
29
15
0
29 Mar 2023
LIPSFUS: A neuromorphic dataset for audio-visual sensory fusion of lip
  reading
LIPSFUS: A neuromorphic dataset for audio-visual sensory fusion of lip reading
A. Rios-Navarro
E. Piñero-Fuentes
S. Canas-Moreno
Aqib Javed
Jin Harkin
A. Linares-Barranco
10
3
0
28 Mar 2023
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma
A. Haliassos
Adriana Fernandez-Lopez
Honglie Chen
Stavros Petridis
M. Pantic
27
106
0
25 Mar 2023
Previous
1234567
Next