ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17002
  4. Cited By
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

22 May 2025
Abdul Hannan
Muhammad Arslan Manzoor
Shah Nawaz
Muhammad Irzam Liaqat
Markus Schedl
Mubashir Noman
    CVBM
ArXivPDFHTML

Papers citing "PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association"

20 / 20 papers shown
Title
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement
  Estimation in Conversation
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation
Vu Ngoc Tu
V. Huynh
Hyung-Jeong Yang
M. Zaheer
Shah Nawaz
Karthik Nandakumar
Soo-Hyung Kim
47
5
0
31 Jul 2023
Single-branch Network for Multimodal Training
Single-branch Network for Multimodal Training
M. S. Saeed
Shah Nawaz
M. H. Khan
M. Zaheer
Karthik Nandakumar
Muhammad Haroon Yousaf
Arif Mahmood
31
13
0
10 Mar 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data
Speaker Recognition in Realistic Scenario Using Multimodal Data
Saqlain Hussain Shah
M. S. Saeed
Shah Nawaz
Muhammad Haroon Yousaf
CVBM
41
9
0
25 Feb 2023
Guiding Attention using Partial-Order Relationships for Image Captioning
Guiding Attention using Partial-Order Relationships for Image Captioning
Murad Popattia
Muhammad Rafi
Rizwan Qureshi
Shah Nawaz
31
5
0
15 Apr 2022
Fusion and Orthogonal Projection for Improved Face-Voice Association
Fusion and Orthogonal Projection for Improved Face-Voice Association
Muhammad Saeed
M. H. Khan
Shah Nawaz
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
112
28
0
20 Dec 2021
Seeking the Shape of Sound: An Adaptive Framework for Learning
  Voice-Face Association
Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association
Peisong Wen
Qianqian Xu
Yangbangyan Jiang
Zhiyong Yang
Yuan He
Qingming Huang
CVBM
38
33
0
12 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
866
29,341
0
26 Feb 2021
A Multi-View Approach To Audio-Visual Speaker Verification
A Multi-View Approach To Audio-Visual Speaker Verification
Leda Sari
Kritika Singh
Jiatong Zhou
Lorenzo Torresani
Nayan Singhal
Yatharth Saraf
90
38
0
11 Feb 2021
Cross-modal Speaker Verification and Recognition: A Multilingual
  Perspective
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
50
26
0
28 Apr 2020
Multimodal Intelligence: Representation Learning, Information Fusion,
  and Applications
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
64
332
0
10 Nov 2019
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual
  Signals
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
47
33
0
18 Sep 2019
Hyperbolic Image Embeddings
Hyperbolic Image Embeddings
Valentin Khrulkov
L. Mirvakhabova
E. Ustinova
Ivan Oseledets
Victor Lempitsky
79
293
0
03 Apr 2019
Utterance-level Aggregation For Speaker Recognition In The Wild
Utterance-level Aggregation For Speaker Recognition In The Wild
Weidi Xie
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
52
344
0
26 Feb 2019
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Yandong Wen
Mahmoud Al Ismail
Weiyang Liu
Bhiksha Raj
Rita Singh
FedML
41
71
0
12 Jul 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
103
141
0
02 May 2018
Representation Tradeoffs for Hyperbolic Embeddings
Representation Tradeoffs for Hyperbolic Embeddings
Christopher De Sa
Albert Gu
Christopher Ré
Frederic Sala
213
412
0
10 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
77
220
0
01 Apr 2018
VoxCeleb: a large-scale speaker identification dataset
VoxCeleb: a large-scale speaker identification dataset
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
122
2,273
0
26 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
80
2,928
0
26 May 2017
Gated Multimodal Units for Information Fusion
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
77
380
0
07 Feb 2017
1