ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.19581
  4. Cited By
Seeing Through the Conversation: Audio-Visual Speech Separation based on
  Diffusion Model

Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model

30 October 2023
Suyeon Lee
Chaeyoung Jung
Youngjoon Jang
Jaehun Kim
Joon Son Chung
ArXivPDFHTML

Papers citing "Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model"

8 / 8 papers shown
Title
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional
  Flow Matching
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
47
4
0
13 Jun 2024
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Ruijie Tao
Xinyuan Qian
Yidi Jiang
Junjie Li
Jiadong Wang
Haizhou Li
34
1
0
29 Apr 2024
Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual
  Target Speech Extraction
Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction
Zhaoxi Mu
Xinyu Yang
32
5
0
19 Apr 2024
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
190
198
0
08 Jan 2021
Lipreading using Temporal Convolutional Networks
Lipreading using Temporal Convolutional Networks
Brais Martínez
Pingchuan Ma
Stavros Petridis
M. Pantic
168
239
0
23 Jan 2020
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
224
2,234
0
14 Jun 2018
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
294
75,834
0
18 May 2015
1