ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.07450
  4. Cited By
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music
  Videos

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

11 September 2024
Yan-Bo Lin
Yu Tian
L. Yang
Gedas Bertasius
Heng Wang
    VGen
ArXivPDFHTML

Papers citing "VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos"

17 / 17 papers shown
Title
AudioX: Diffusion Transformer for Anything-to-Audio Generation
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Yu Guo
78
4
0
13 Mar 2025
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
78
9
0
20 May 2024
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
68
171
0
01 Jun 2023
Efficient Neural Music Generation
Efficient Neural Music Generation
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffM
MGen
52
50
0
25 May 2023
MusicLM: Generating Music From Text
MusicLM: Generating Music From Text
A. Agostinelli
Timo I. Denk
Zalan Borsos
Jesse Engel
Mauro Verzetti
...
Adam Roberts
Marco Tagliasacchi
Matthew Sharifi
Neil Zeghidour
Christian Frank
MGen
89
427
0
26 Jan 2023
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
43
295
0
30 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio Generation
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
98
596
0
07 Sep 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
51
298
0
20 Jul 2022
It's Time for Artistic Correspondence in Music and Video
It's Time for Artistic Correspondence in Music and Video
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
41
37
0
14 Jun 2022
Video Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music Transformer
Shangzhe Di
Jiang
Sihan Liu
Zhaokai Wang
Leyan Zhu
Zexin He
Hongming Liu
Shuicheng Yan
37
91
0
16 Nov 2021
Taming Visually Guided Sound Generation
Taming Visually Guided Sound Generation
Vladimir E. Iashin
Esa Rahtu
VLM
61
125
0
17 Oct 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
689
28,659
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
408
40,217
0
22 Oct 2020
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
202
2,583
0
26 Mar 2020
Audio to Body Dynamics
Audio to Body Dynamics
Eli Shlizerman
Lucio Dery
Hayden Schoen
Ira Kemelmacher-Shlizerman
VGen
66
154
0
19 Dec 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
206
7,961
0
22 May 2017
CNN Architectures for Large-Scale Audio Classification
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
94
2,488
0
29 Sep 2016
1