ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.07967
  4. Cited By
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization

Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization

12 September 2024
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
ArXivPDFHTML

Papers citing "Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization"

50 / 58 papers shown
Title
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
X. Yu
Yan Fang
Xiaojie Jin
Yao Zhao
Yunchao Wei
33
0
0
29 May 2025
Learning Clustering-based Prototypes for Compositional Zero-shot Learning
Learning Clustering-based Prototypes for Compositional Zero-shot Learning
Hongyu Qu
Jianan Wei
Xiangbo Shu
Wenguan Wang
VLM
112
1
0
10 Feb 2025
CACE-Net: Co-guidance Attention and Contrastive Enhancement for
  Effective Audio-Visual Event Localization
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization
Xiang He
Xiangxi Liu
Yang Li
Dongcheng Zhao
Guobin Shen
Qingqun Kong
Xin Yang
Yi Zeng
90
5
0
04 Aug 2024
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Hongyu Qu
Rui Yan
Xiangbo Shu
Haoliang Gao
Peng Huang
Guo-Sen Xie
93
4
0
03 May 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Tiantian Geng
Teng Wang
Yanfu Zhang
Jinming Duan
Weili Guan
Feng Zheng
49
2
0
04 Apr 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Jongsuk Kim
Hyeongkeun Lee
Kyeongha Rho
Junmo Kim
Joon Son Chung
45
6
0
14 Mar 2024
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
  Downstream Tasks
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
Haoyi Duan
Yan Xia
Mingze Zhou
Li Tang
Jieming Zhu
Zhou Zhao
VLM
46
19
0
09 Nov 2023
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
David Junhao Zhang
Jay Zhangjie Wu
Jia-Wei Liu
Rui Zhao
L. Ran
Yuchao Gu
Difei Gao
Mike Zheng Shou
DiffM
VGen
76
218
0
27 Sep 2023
Self-Feedback DETR for Temporal Action Detection
Self-Feedback DETR for Temporal Action Detection
Jihwan Kim
Miso Lee
Jae-Pil Heo
73
19
0
21 Aug 2023
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation
  Knowledge
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
Chen Liu
Peike Li
Hu Zhang
Lincheng Li
Zi Huang
Dadong Wang
Xin Yu
VOS
61
28
0
20 Aug 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
73
120
0
18 May 2023
Vision Transformer with Quadrangle Attention
Vision Transformer with Quadrangle Attention
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
41
39
0
27 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale
  Benchmark and Baseline
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Tiantian Geng
Teng Wang
Jinming Duan
Runmin Cong
Feng Zheng
40
33
0
22 Mar 2023
TriDet: Temporal Action Detection with Relative Boundary Modeling
TriDet: Temporal Action Detection with Relative Boundary Modeling
Ding Shi
Yujie Zhong
Qiong Cao
Lin Ma
Jia Li
Dacheng Tao
ViT
74
130
0
13 Mar 2023
Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning
Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning
Cong Cao
Huanjing Yue
Xin Liu
Jingyu Yang
54
11
0
13 Mar 2023
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Bo He
Jun Wang
Jielin Qiu
Trung Bui
Abhinav Shrivastava
Zhaowen Wang
52
68
0
13 Mar 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
Audio-Visual Contrastive Learning with Temporal Self-Supervision
Simon Jenni
Alexander Black
John Collomosse
SSL
49
16
0
15 Feb 2023
Contrastive Positive Sample Propagation along the Audio-Visual Event
  Line
Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Dan Guo
Meng Wang
100
53
0
18 Nov 2022
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in
  Temporal Action Localization Tasks
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
Hyolim Kang
Hanjung Kim
Joungbin An
Minsu Cho
Seon Joo Kim
59
5
0
11 Nov 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source
  Separation
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
Moitreya Chatterjee
Narendra Ahuja
A. Cherian
51
12
0
29 Oct 2022
Inductive and Transductive Few-Shot Video Classification via Appearance
  and Temporal Alignments
Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments
Khoi Duc Minh Nguyen
Quoc-Huy Tran
Khoi Nguyen
Binh-Son Hua
Rang Nguyen
80
29
0
21 Jul 2022
ReAct: Temporal Action Detection with Relational Queries
ReAct: Temporal Action Detection with Relational Queries
Ding Shi
Yujie Zhong
Qiong Cao
Jing Zhang
Lin Ma
Jia Li
Dacheng Tao
ViT
77
68
0
14 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
62
27
0
20 Jun 2022
Noise-Tolerant Learning for Audio-Visual Action Recognition
Noise-Tolerant Learning for Audio-Visual Action Recognition
Haocheng Han
Qinghua Zheng
Minnan Luo
Kaiyao Miao
Feng Tian
Yuanchun Chen
NoLa
51
7
0
16 May 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model
  Pretraining
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
Yuting Gao
Jinfeng Liu
Zihan Xu
Jinchao Zhang
Ke Li
Rongrong Ji
Chunhua Shen
VLM
CLIP
87
103
0
29 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
VSA: Learning Varied-Size Window Attention in Vision Transformers
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
50
53
0
18 Apr 2022
ActionFormer: Localizing Moments of Actions with Transformers
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
59
340
0
16 Feb 2022
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal
  Representation Learning
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
46
30
0
24 Nov 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual
  Event Localization and Video Parsing
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
74
58
0
24 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
ViT
203
1,801
0
18 Nov 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
167
1,943
0
16 Jul 2021
End-to-end Temporal Action Detection with Transformer
End-to-end Temporal Action Detection with Transformer
Xiaolong Liu
Qimeng Wang
Yao Hu
Xu Tang
Shiwei Zhang
S. Bai
X. Bai
ViT
89
230
0
18 Jun 2021
Cross-Referencing Self-Training Network for Sound Event Detection in
  Audio Mixtures
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures
Sangwook Park
D. Han
Mounya Elhilali
42
12
0
27 May 2021
Unsupervised Action Segmentation by Joint Representation Learning and
  Online Clustering
Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Sateesh Kumar
S. Haresh
Awais Ahmed
Andrey Konin
M. Zia
Quoc-Huy Tran
SSL
50
48
0
27 May 2021
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
Jiashuo Yu
Ying Cheng
Rui Feng
51
14
0
07 Apr 2021
Positive Sample Propagation along the Audio-Visual Event Line
Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Liang Zheng
Yiran Zhong
Shijie Hao
Meng Wang
64
101
0
01 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
398
21,281
0
25 Mar 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
100
180
0
03 Feb 2021
Video Self-Stitching Graph Network for Temporal Action Localization
Video Self-Stitching Graph Network for Temporal Action Localization
Chen Zhao
Ali K. Thabet
Guohao Li
60
140
0
30 Nov 2020
Discriminative Sounding Objects Localization via Self-supervised
  Audiovisual Matching
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
46
135
0
12 Oct 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
349
12,966
0
26 May 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
128
4,048
0
10 Apr 2020
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
211
2,612
0
26 Mar 2020
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
72
251
0
10 Dec 2019
Few-Shot Video Classification via Temporal Alignment
Few-Shot Video Classification via Temporal Alignment
Kaidi Cao
Jingwei Ji
Zhangjie Cao
C. Chang
Juan Carlos Niebles
AI4TS
60
238
0
27 Jun 2019
Noise-Aware Unsupervised Deep Lidar-Stereo Fusion
Noise-Aware Unsupervised Deep Lidar-Stereo Fusion
Xuelian Cheng
Yiran Zhong
Yuchao Dai
Pan Ji
Hongdong Li
3DPC
3DV
68
66
0
08 Apr 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding
  Box Regression
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
145
4,143
0
25 Feb 2019
Dual-modality seq2seq network for audio-visual event localization
Dual-modality seq2seq network for audio-visual event localization
Yan-Bo Lin
Yu-Jhe Li
Y. Wang
58
128
0
20 Feb 2019
Multi-granularity Generator for Temporal Action Proposal
Multi-granularity Generator for Temporal Action Proposal
Yuan Liu
Lin Ma
Yifeng Zhang
Wen Liu
Shih-Fu Chang
71
193
0
28 Nov 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
92
435
0
23 Mar 2018
12
Next