Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.07967
Cited By
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
12 September 2024
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization"
50 / 58 papers shown
Title
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
X. Yu
Yan Fang
Xiaojie Jin
Yao Zhao
Yunchao Wei
33
0
0
29 May 2025
Learning Clustering-based Prototypes for Compositional Zero-shot Learning
Hongyu Qu
Jianan Wei
Xiangbo Shu
Wenguan Wang
VLM
112
1
0
10 Feb 2025
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization
Xiang He
Xiangxi Liu
Yang Li
Dongcheng Zhao
Guobin Shen
Qingqun Kong
Xin Yang
Yi Zeng
90
5
0
04 Aug 2024
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Hongyu Qu
Rui Yan
Xiangbo Shu
Haoliang Gao
Peng Huang
Guo-Sen Xie
93
4
0
03 May 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Tiantian Geng
Teng Wang
Yanfu Zhang
Jinming Duan
Weili Guan
Feng Zheng
49
2
0
04 Apr 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Jongsuk Kim
Hyeongkeun Lee
Kyeongha Rho
Junmo Kim
Joon Son Chung
45
6
0
14 Mar 2024
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
Haoyi Duan
Yan Xia
Mingze Zhou
Li Tang
Jieming Zhu
Zhou Zhao
VLM
46
19
0
09 Nov 2023
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
David Junhao Zhang
Jay Zhangjie Wu
Jia-Wei Liu
Rui Zhao
L. Ran
Yuchao Gu
Difei Gao
Mike Zheng Shou
DiffM
VGen
76
218
0
27 Sep 2023
Self-Feedback DETR for Temporal Action Detection
Jihwan Kim
Miso Lee
Jae-Pil Heo
73
19
0
21 Aug 2023
BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
Chen Liu
Peike Li
Hu Zhang
Lincheng Li
Zi Huang
Dadong Wang
Xin Yu
VOS
61
28
0
20 Aug 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
73
120
0
18 May 2023
Vision Transformer with Quadrangle Attention
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
41
39
0
27 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Tiantian Geng
Teng Wang
Jinming Duan
Runmin Cong
Feng Zheng
40
33
0
22 Mar 2023
TriDet: Temporal Action Detection with Relative Boundary Modeling
Ding Shi
Yujie Zhong
Qiong Cao
Lin Ma
Jia Li
Dacheng Tao
ViT
74
130
0
13 Mar 2023
Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning
Cong Cao
Huanjing Yue
Xin Liu
Jingyu Yang
54
11
0
13 Mar 2023
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Bo He
Jun Wang
Jielin Qiu
Trung Bui
Abhinav Shrivastava
Zhaowen Wang
52
68
0
13 Mar 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
Simon Jenni
Alexander Black
John Collomosse
SSL
49
16
0
15 Feb 2023
Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Dan Guo
Meng Wang
100
53
0
18 Nov 2022
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
Hyolim Kang
Hanjung Kim
Joungbin An
Minsu Cho
Seon Joo Kim
59
5
0
11 Nov 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
Moitreya Chatterjee
Narendra Ahuja
A. Cherian
51
12
0
29 Oct 2022
Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments
Khoi Duc Minh Nguyen
Quoc-Huy Tran
Khoi Nguyen
Binh-Son Hua
Rang Nguyen
80
29
0
21 Jul 2022
ReAct: Temporal Action Detection with Relational Queries
Ding Shi
Yujie Zhong
Qiong Cao
Jing Zhang
Lin Ma
Jia Li
Dacheng Tao
ViT
77
68
0
14 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
62
27
0
20 Jun 2022
Noise-Tolerant Learning for Audio-Visual Action Recognition
Haocheng Han
Qinghua Zheng
Minnan Luo
Kaiyao Miao
Feng Tian
Yuanchun Chen
NoLa
51
7
0
16 May 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
Yuting Gao
Jinfeng Liu
Zihan Xu
Jinchao Zhang
Ke Li
Rongrong Ji
Chunhua Shen
VLM
CLIP
87
103
0
29 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
50
53
0
18 Apr 2022
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
59
340
0
16 Feb 2022
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
46
30
0
24 Nov 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
74
58
0
24 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
ViT
203
1,801
0
18 Nov 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
167
1,943
0
16 Jul 2021
End-to-end Temporal Action Detection with Transformer
Xiaolong Liu
Qimeng Wang
Yao Hu
Xu Tang
Shiwei Zhang
S. Bai
X. Bai
ViT
89
230
0
18 Jun 2021
Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures
Sangwook Park
D. Han
Mounya Elhilali
42
12
0
27 May 2021
Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Sateesh Kumar
S. Haresh
Awais Ahmed
Andrey Konin
M. Zia
Quoc-Huy Tran
SSL
50
48
0
27 May 2021
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
Jiashuo Yu
Ying Cheng
Rui Feng
51
14
0
07 Apr 2021
Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Liang Zheng
Yiran Zhong
Shijie Hao
Meng Wang
64
101
0
01 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
398
21,281
0
25 Mar 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
100
180
0
03 Feb 2021
Video Self-Stitching Graph Network for Temporal Action Localization
Chen Zhao
Ali K. Thabet
Guohao Li
60
140
0
30 Nov 2020
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
46
135
0
12 Oct 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
349
12,966
0
26 May 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
128
4,048
0
10 Apr 2020
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
211
2,612
0
26 Mar 2020
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
72
251
0
10 Dec 2019
Few-Shot Video Classification via Temporal Alignment
Kaidi Cao
Jingwei Ji
Zhangjie Cao
C. Chang
Juan Carlos Niebles
AI4TS
60
238
0
27 Jun 2019
Noise-Aware Unsupervised Deep Lidar-Stereo Fusion
Xuelian Cheng
Yiran Zhong
Yuchao Dai
Pan Ji
Hongdong Li
3DPC
3DV
68
66
0
08 Apr 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
145
4,143
0
25 Feb 2019
Dual-modality seq2seq network for audio-visual event localization
Yan-Bo Lin
Yu-Jhe Li
Y. Wang
58
128
0
20 Feb 2019
Multi-granularity Generator for Temporal Action Proposal
Yuan Liu
Lin Ma
Yifeng Zhang
Wen Liu
Shih-Fu Chang
71
193
0
28 Nov 2018
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
92
435
0
23 Mar 2018
1
2
Next