ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.08001
  4. Cited By
Perfect match: Improved cross-modal embeddings for audio-visual
  synchronisation

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

21 September 2018
Soo-Whan Chung
Joon Son Chung
Hong-Goo Kang
ArXivPDFHTML

Papers citing "Perfect match: Improved cross-modal embeddings for audio-visual synchronisation"

39 / 39 papers shown
Title
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Davide Berghi
Philip J. B. Jackson
50
0
0
01 Jun 2024
Pretext Training Algorithms for Event Sequence Data
Pretext Training Algorithms for Event Sequence Data
Yimu Wang
He Zhao
Ruizhi Deng
Frederick Tung
Greg Mori
AI4TS
34
0
0
16 Feb 2024
Synchformer: Efficient Synchronization from Sparse Cues
Synchformer: Efficient Synchronization from Sparse Cues
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
24
11
0
29 Jan 2024
ModEFormer: Modality-Preserving Embedding for Audio-Video
  Synchronization using Transformers
ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Akash Gupta
Rohun Tripathi
Won-Kap Jang
29
6
0
21 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset
WASD: A Wilder Active Speaker Detection Dataset
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
21
3
0
09 Mar 2023
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
45
49
0
12 Dec 2022
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via
  Audio-Lip Memory
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
Se Jin Park
Minsu Kim
Joanna Hong
J. Choi
Y. Ro
CVBM
30
85
0
02 Nov 2022
Multimodal Transformer Distillation for Audio-Visual Synchronization
Multimodal Transformer Distillation for Audio-Visual Synchronization
Xuan-Bo Chen
Haibin Wu
Chung-Che Wang
Hung-yi Lee
J. Jang
26
3
0
27 Oct 2022
Towards Effective Image Manipulation Detection with Proposal Contrastive
  Learning
Towards Effective Image Manipulation Detection with Proposal Contrastive Learning
Yuyuan Zeng
Bowen Zhao
Shanzhao Qiu
Tao Dai
Shutao Xia
34
25
0
16 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
78
23
0
27 Sep 2022
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
Sindhu B. Hegde
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
CVBM
16
1
0
17 Aug 2022
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active
  Speaker Selection
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Otavio Braga
Olivier Siohan
24
7
0
11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech
  Recognition and Active Speaker Detection
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Otavio Braga
Olivier Siohan
CVBM
32
8
0
10 May 2022
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
V. S. Kadandale
Juan F. Montesinos
G. Haro
27
23
0
05 Apr 2022
Multi-modality Associative Bridging through Memory: Speech Sound
  Recollected from Face Video
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim
Joanna Hong
Se Jin Park
Yong Man Ro
CVBM
25
40
0
04 Apr 2022
Speaker Extraction with Co-Speech Gestures Cue
Speaker Extraction with Co-Speech Gestures Cue
Zexu Pan
Xinyuan Qian
Haizhou Li
SLR
21
27
0
31 Mar 2022
End to End Lip Synchronization with a Temporal AutoEncoder
End to End Lip Synchronization with a Temporal AutoEncoder
Yoav Shalev
Lior Wolf
16
7
0
30 Mar 2022
Learning Contextually Fused Audio-visual Representations for
  Audio-visual Speech Recognition
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
41
10
0
15 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
  Detection
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
M. Pantic
CVBM
40
126
0
18 Jan 2022
End-to-end speaker diarization with transformer
End-to-end speaker diarization with transformer
Yongquan Lai
Xin Tang
Yuanyuan Fu
Rui Fang
31
1
0
14 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction
  and Lip Reading
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
38
23
0
09 Dec 2021
Look Who's Talking: Active Speaker Detection in the Wild
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
46
20
0
17 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection
UniCon: Unified Context Network for Robust Active Speaker Detection
Yuanhang Zhang
Susan Liang
Shuang Yang
Xiao-Chang Liu
Zhongqin Wu
Shiguang Shan
Xilin Chen
CVBM
29
36
0
05 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for
  Audio-visual Active Speaker Detection
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
22
176
0
14 Jul 2021
Active Speaker Detection as a Multi-Objective Optimization with
  Uncertainty-based Multimodal Fusion
Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion
Baptiste Pouthier
L. Pilati
Leela K. Gudupudi
C. Bouveyron
F. Precioso
25
11
0
07 Jun 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data
Divide and Contrast: Self-supervised Learning from Uncurated Data
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
64
96
0
17 May 2021
Representation Learning via Global Temporal Alignment and
  Cycle-Consistency
Representation Learning via Global Temporal Alignment and Cycle-Consistency
Isma Hadji
Konstantinos G. Derpanis
Allan D. Jepson
AI4TS
24
54
0
11 May 2021
Composable Augmentation Encoding for Video Representation Learning
Composable Augmentation Encoding for Video Representation Learning
Chen Sun
Arsha Nagrani
Yonglong Tian
Cordelia Schmid
SSL
AI4TS
37
17
0
01 Apr 2021
Cross-Modal Contrastive Learning for Text-to-Image Generation
Cross-Modal Contrastive Learning for Text-to-Image Generation
Han Zhang
Jing Yu Koh
Jason Baldridge
Honglak Lee
Yinfei Yang
GAN
22
355
0
12 Jan 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
65
51
0
11 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
Look, Listen, and Attend: Co-Attention Network for Self-Supervised
  Audio-Visual Representation Learning
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
36
106
0
13 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
253
0
10 Aug 2020
Modality Dropout for Improved Performance-driven Talking Faces
Modality Dropout for Improved Performance-driven Talking Faces
Ahmed Hussen Abdelaziz
B. Theobald
Paul Dixon
Reinhard Knothe
N. Apostoloff
Sachin Kajareker
24
37
0
27 May 2020
What Makes for Good Views for Contrastive Learning?
What Makes for Good Views for Contrastive Learning?
Yonglong Tian
Chen Sun
Ben Poole
Dilip Krishnan
Cordelia Schmid
Phillip Isola
SSL
39
1,308
0
20 May 2020
Active Speakers in Context
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
32
61
0
20 May 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
21
88
0
20 Feb 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
1