ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.09966
  4. Cited By
Temporal and cross-modal attention for audio-visual zero-shot learning

Temporal and cross-modal attention for audio-visual zero-shot learning

20 July 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
ArXivPDFHTML

Papers citing "Temporal and cross-modal attention for audio-visual zero-shot learning"

21 / 21 papers shown
Title
Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning
Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning
Yang Liu
Xinming Zhang
Jiale Du
Xinbo Gao
Jungong Han
OODD
49
0
0
28 Mar 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
41
0
0
17 Mar 2025
Discrepancy-Aware Attention Network for Enhanced Audio-Visual Zero-Shot
  Learning
Discrepancy-Aware Attention Network for Enhanced Audio-Visual Zero-Shot Learning
RunLin Yu
Yipu Gong
Wenrui Li
Aiwen Sun
Mengren Zheng
VLM
72
0
0
16 Dec 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Hao Wu
VLM
58
4
0
18 Nov 2024
Multi-label Zero-Shot Audio Classification with Temporal Attention
Multi-label Zero-Shot Audio Classification with Temporal Attention
Duygu Dogan
Huang Xie
Toni Heittola
Tuomas Virtanen
VLM
29
0
0
31 Aug 2024
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification
Mahrukh Awan
Asmar Nadeem
Muhammad Junaid Awan
Armin Mustafa
Syed Sameed Husain
25
1
0
26 Aug 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
33
5
0
18 Jul 2024
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
Wenrui Li
Penghong Wang
Ruiqin Xiong
Xiaopeng Fan
34
8
0
11 Jul 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large
  Multi-Modal Models
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
David Kurzendörfer
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
VLM
CLIP
33
2
0
09 Apr 2024
Boosting Audio-visual Zero-shot Learning with Large Language Models
Boosting Audio-visual Zero-shot Learning with Large Language Models
Haoxing Chen
Yaohui Li
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
Jun Lan
Huijia Zhu
Weiqiang Wang
VLM
45
1
0
21 Nov 2023
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware
  Sound Separation
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Yiyang Su
A. Vosoughi
Shijian Deng
Yapeng Tian
Chenliang Xu
26
4
0
18 Oct 2023
Video-adverb retrieval with compositional adverb-action embeddings
Video-adverb retrieval with compositional adverb-action embeddings
Thomas Hummel
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
22
1
0
26 Sep 2023
Text-to-feature diffusion for audio-visual few-shot learning
Text-to-feature diffusion for audio-visual few-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
VLM
27
2
0
07 Sep 2023
Hyperbolic Audio-visual Zero-shot Learning
Hyperbolic Audio-visual Zero-shot Learning
Jie Hong
Zeeshan Hayder
Junlin Han
Pengfei Fang
Mehrtash Harandi
L. Petersson
28
13
0
24 Aug 2023
Audio-Visual Class-Incremental Learning
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
27
28
0
21 Aug 2023
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
31
26
0
07 Dec 2022
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
80
80
0
22 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
164
170
0
20 Apr 2021
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
424
596
0
21 Jul 2020
Rethinking Zero-shot Video Classification: End-to-end Training for
  Realistic Applications
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Biagio Brattoli
Joseph Tighe
Fedor Zhdanov
Pietro Perona
Krzysztof Chalupka
VLM
137
127
0
03 Mar 2020
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
242
31,257
0
16 Jan 2013
1