Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.15260
Cited By
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
22 November 2024
Jiahao Hu
Tianxiong Zhong
Xuebo Wang
Boyuan Jiang
Xingye Tian
Fei Yang
Pengfei Wan
Di Zhang
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing"
37 / 37 papers shown
Title
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
213
508
0
12 Aug 2024
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
...
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
134
871
0
01 Aug 2024
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Haozhe Zhao
Xiaojian Ma
Liang Chen
Shuzheng Si
Rujie Wu
Kaikai An
Peiyu Yu
Minjia Zhang
Qing Li
Baobao Chang
85
57
0
07 Jul 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
106
605
0
25 Apr 2024
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Mude Hui
Siwei Yang
Bingchen Zhao
Yichun Shi
Heng Wang
Peng Wang
Yuyin Zhou
Cihang Xie
67
67
0
15 Apr 2024
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi
Shihao Zhao
Xianbiao Qi
Jianan Wang
Yukai Shi
Qianyu Chen
Bin Liang
Kam-Fai Wong
Lei Zhang
DiffM
VGen
72
19
0
18 Mar 2024
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
Xu Ju
Xian Liu
Xintao Wang
Hao Wang
Ying Shan
Qiang Xu
68
74
0
11 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
127
198
0
29 Feb 2024
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
Jianhong Bai
Tianyu He
Yuchi Wang
Junliang Guo
Haoji Hu
Zuozhu Liu
Jiang Bian
VGen
64
29
0
20 Feb 2024
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara
Barışcan Kurtkaya
Hidir Yesiltepe
James M. Rehg
Pinar Yanardag
VGen
DiffM
61
53
0
07 Dec 2023
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang
Bichen Wu
Xiaoyan Wang
Yaqiao Luo
Luxin Zhang
Yinan Zhao
Peter Vajda
Dimitris N. Metaxas
Licheng Yu
VGen
DiffM
77
38
0
06 Dec 2023
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Junhao Zhuang
Yanhong Zeng
Wenran Liu
Chun Yuan
Kai Chen
DiffM
95
73
0
06 Dec 2023
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Shuyuan Tu
Qi Dai
Zhi-Qi Cheng
Hang-Rui Hu
Xintong Han
Zuxuan Wu
Yu-Gang Jiang
DiffM
VGen
64
31
0
30 Nov 2023
Consistent Video-to-Video Transfer Using Synthetic Dataset
Jiaxin Cheng
Tianjun Xiao
Tong He
VGen
DiffM
58
14
0
01 Nov 2023
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Yuren Cong
Mengmeng Xu
Christian Simon
Shoufa Chen
Jiawei Ren
Yanping Xie
Juan-Manuel Perez-Rua
Bodo Rosenhahn
Tao Xiang
Sen He
DiffM
VGen
84
81
0
09 Oct 2023
ProPainter: Improving Propagation and Transformer for Video Inpainting
Shangchen Zhou
Chongyi Li
Kelvin C. K. Chan
Chen Change Loy
ViT
72
102
0
07 Sep 2023
MagicEdit: High-Fidelity and Temporally Coherent Video Editing
Jun Hao Liew
Hanshu Yan
Jianfeng Zhang
Zhongcong Xu
Jiashi Feng
VGen
DiffM
58
52
0
28 Aug 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang
Lingbo Mo
Wenhu Chen
Huan Sun
Yu-Chuan Su
EGVM
166
264
0
16 Jun 2023
Improving Tuning-Free Real Image Editing with Proximal Guidance
Ligong Han
Song Wen
Qi Chen
Zhixing Zhang
Kunpeng Song
...
Qilong Zhangli
Jindong Jiang
Zhaoyang Xia
Akash Srivastava
Dimitris N. Metaxas
DiffM
68
61
0
08 Jun 2023
Recognize Anything: A Strong Image Tagging Model
Youcai Zhang
Xinyu Huang
Jinyu Ma
Zhaoyang Li
Zhaochuan Luo
...
Tong Luo
Yaqian Li
Siyi Liu
Yandong Guo
Lei Zhang
VLM
103
237
0
06 Jun 2023
VideoComposer: Compositional Video Synthesis with Motion Controllability
Xiang Wang
Hangjie Yuan
Shiwei Zhang
Dayou Chen
Jiuniu Wang
Yingya Zhang
Yujun Shen
Deli Zhao
Jingren Zhou
VGen
DiffM
87
335
0
03 Jun 2023
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models
Daiki Miyake
Akihiro Iohara
Yuriko Saito
Toshiyuki Tanaka
DiffM
71
117
0
26 May 2023
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
Bosheng Qin
Juncheng Li
Siliang Tang
Tat-Seng Chua
Yueting Zhuang
VGen
DiffM
57
17
0
21 May 2023
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
329
7,278
0
05 Apr 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
Chenyang Qi
Xiaodong Cun
Yong Zhang
Chenyang Lei
Xintao Wang
Ying Shan
Qifeng Chen
VGen
75
346
0
16 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
179
1,965
0
09 Mar 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
102
725
0
22 Dec 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
198
1,796
0
17 Nov 2022
Imagic: Text-Based Real Image Editing with Diffusion Models
Bahjat Kawar
Shiran Zada
Oran Lang
Omer Tov
Hui-Tang Chang
Tali Dekel
Inbar Mosseri
Michal Irani
69
1,085
0
17 Oct 2022
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz
Ron Mokady
J. Tenenbaum
Kfir Aberman
Yael Pritch
Daniel Cohen-Or
DiffM
182
1,768
0
02 Aug 2022
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Rinon Gal
Yuval Alaluf
Yuval Atzmon
Or Patashnik
Amit H. Bermano
Gal Chechik
Daniel Cohen-Or
153
1,874
0
02 Aug 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
413
15,486
0
20 Dec 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
433
10,328
0
17 Jun 2021
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffM
SyDa
330
6,453
0
26 Nov 2020
Denoising Diffusion Implicit Models
Jiaming Song
Chenlin Meng
Stefano Ermon
VLM
DiffM
260
7,356
0
06 Oct 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
622
18,036
0
19 Jun 2020
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
219
2,623
0
26 Mar 2020
1