PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

22 May 2025

Abdul Hannan

Muhammad Arslan Manzoor

Shah Nawaz

Muhammad Irzam Liaqat

Papers citing "PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association"

20 / 20 papers shown

Title
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation Vu Ngoc Tu V. Huynh Hyung-Jeong Yang M. Zaheer Shah Nawaz Karthik Nandakumar Soo-Hyung Kim 47 5 0 31 Jul 2023
Single-branch Network for Multimodal Training M. S. Saeed Shah Nawaz M. H. Khan M. Zaheer Karthik Nandakumar Muhammad Haroon Yousaf Arif Mahmood 31 13 0 10 Mar 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data Saqlain Hussain Shah M. S. Saeed Shah Nawaz Muhammad Haroon Yousaf CVBM 41 9 0 25 Feb 2023
Guiding Attention using Partial-Order Relationships for Image Captioning Murad Popattia Muhammad Rafi Rizwan Qureshi Shah Nawaz 31 5 0 15 Apr 2022
Fusion and Orthogonal Projection for Improved Face-Voice Association Muhammad Saeed M. H. Khan Shah Nawaz Muhammad Haroon Yousaf Alessio Del Bue CVBM 112 28 0 20 Dec 2021
Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association Peisong Wen Qianqian Xu Yangbangyan Jiang Zhiyong Yang Yuan He Qingming Huang CVBM 38 33 0 12 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 866 29,341 0 26 Feb 2021
A Multi-View Approach To Audio-Visual Speaker Verification Leda Sari Kritika Singh Jiatong Zhou Lorenzo Torresani Nayan Singhal Yatharth Saraf 90 38 0 11 Feb 2021
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective M. S. Saeed Shah Nawaz Pietro Morerio Arif Mahmood I. Gallo Muhammad Haroon Yousaf Alessio Del Bue CVBM 50 26 0 28 Apr 2020
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications Chao Zhang Zichao Yang Xiaodong He Li Deng HAI AI4TS 64 332 0 10 Nov 2019
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals Shah Nawaz Muhammad Kamran Janjua I. Gallo Arif Mahmood Alessandro Calefati 47 33 0 18 Sep 2019
Hyperbolic Image Embeddings Valentin Khrulkov L. Mirvakhabova E. Ustinova Ivan Oseledets Victor Lempitsky 79 293 0 03 Apr 2019
Utterance-level Aggregation For Speaker Recognition In The Wild Weidi Xie Arsha Nagrani Joon Son Chung Andrew Zisserman 52 344 0 26 Feb 2019
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces Yandong Wen Mahmoud Al Ismail Weiyang Liu Bhiksha Raj Rita Singh FedML 41 71 0 12 Jul 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani Samuel Albanie Andrew Zisserman SSL 103 141 0 02 May 2018
Representation Tradeoffs for Hyperbolic Embeddings Christopher De Sa Albert Gu Christopher Ré Frederic Sala 213 412 0 10 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching Arsha Nagrani Samuel Albanie Andrew Zisserman CVBM 77 220 0 01 Apr 2018
VoxCeleb: a large-scale speaker identification dataset Arsha Nagrani Joon Son Chung Andrew Zisserman 122 2,273 0 26 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy T. Baltrušaitis Chaitanya Ahuja Louis-Philippe Morency 80 2,928 0 26 May 2017
Gated Multimodal Units for Information Fusion John Arevalo Thamar Solorio Manuel Montes-y-Gómez Fabio Gonzalez 77 380 0 07 Feb 2017