Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.12154
Cited By
Learning to Highlight Audio by Watching Movies
17 May 2025
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to Highlight Audio by Watching Movies"
29 / 29 papers shown
Title
ZeroSep: Separate Anything in Audio with Zero Training
Chao Huang
Yuesheng Ma
J. Huang
Susan Liang
Yunlong Tang
Jing Bi
Wenqiang Liu
Nima Mesgarani
Chenliang Xu
DiffM
VLM
43
0
0
29 May 2025
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
133
642
0
25 Apr 2024
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
KELM
80
5
0
06 Feb 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
124
94
0
25 Dec 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
184
4,180
1
10 Feb 2023
Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production
Anyi Rao
Xuekun Jiang
Yuwei Guo
Linning Xu
Lei Yang
Libiao Jin
Dahua Lin
Bo Dai
VGen
70
16
0
30 Jan 2023
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
99
29
0
07 Dec 2022
Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
Junghyun Koo
Marco A. Martínez-Ramírez
Wei-Hsiang Liao
Stefan Uhlich
Kyogu Lee
Yuki Mitsufuji
69
21
0
04 Nov 2022
Style Transfer of Audio Effects with Differentiable Signal Processing
C. Steinmetz
Nicholas J. Bryan
Joshua D. Reiss
59
45
0
18 Jul 2022
Hybrid Spectrogram and Waveform Source Separation
Alexandre Défossez
85
174
0
05 Nov 2021
Efficient Training of Audio Transformers with Patchout
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
136
261
0
11 Oct 2021
Don't Separate, Learn to Remix: End-to-End Neural Remixing with Joint Optimization
Haici Yang
Shivani Firodiya
Nicholas J. Bryan
Minje Kim
58
7
0
28 Jul 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
46
59
0
13 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.0K
29,926
0
26 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
237
202
0
08 Jan 2021
Attention is All You Need in Speech Separation
Cem Subakan
Mirco Ravanelli
Samuele Cornell
Mirko Bronzi
Jianyuan Zhong
97
565
0
25 Oct 2020
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
69
136
0
12 Oct 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
111
184
0
21 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
64
157
0
13 Jul 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
220
84
0
04 May 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
503
20,376
0
23 Oct 2019
Play as You Like: Timbre-enhanced Multi-modal Music Style Transfer
Chien-Yu Lu
Min-Xin Xue
Chia-Che Chang
Che-Rung Lee
Li Su
83
34
0
28 Nov 2018
Modeling of nonlinear audio effects with end-to-end deep neural networks
M. M. Ramírez
Joshua D. Reiss
66
37
0
15 Oct 2018
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo
N. Mesgarani
171
1,796
0
20 Sep 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
100
754
0
10 Apr 2018
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
102
537
0
09 Apr 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
808
132,725
0
12 Jun 2017
Deep Cross-Modal Audio-Visual Generation
Lele Chen
Sudhanshu Srivastava
Z. Duan
Chenliang Xu
100
221
0
26 Apr 2017
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.9K
77,520
0
18 May 2015
1