Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.06410
Cited By
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
10 June 2023
Xize Cheng
Tao Jin
Lin Li
Wang Lin
Xinyu Duan
Zhou Zhao
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment"
12 / 12 papers shown
Title
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders
Seungbae Kim
Daeun Lee
Brielle Stark
Jinyoung Han
43
0
0
18 Feb 2025
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Jialong Zuo
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Xize Cheng
...
Wenrui Liu
Guangyan Zhang
Zehai Tu
Yiwen Guo
Zhou Zhao
54
0
0
08 Feb 2025
Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation
Jintao Tan
Xize Cheng
Lingyu Xiong
Lei Zhu
Xiandong Li
Wenxiong Kang
Kai Gong
Minglei Li
Yi Cai
DiffM
28
2
0
03 Aug 2024
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
Xize Cheng
Rongjie Huang
Linjun Li
Tao Jin
Zehan Wang
Aoxiong Yin
Minglei Li
Xinyu Duan
Changpeng Yang
Zhou Zhao
33
2
0
23 Dec 2023
Language Model is a Branch Predictor for Simultaneous Machine Translation
Aoxiong Yin
Tianyun Zhong
Haoyuan Li
Siliang Tang
Zhou Zhao
27
1
0
22 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
33
12
0
05 Dec 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
26
24
0
09 Mar 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
151
317
0
30 Jan 2023
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
115
34
0
15 May 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
128
144
0
26 Feb 2022
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
251
2,233
0
14 Jun 2018
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
167
784
0
16 Nov 2016
1