Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.19084
Cited By
Jodi: Unification of Visual Generation and Understanding via Joint Modeling
25 May 2025
Yifeng Xu
Zhenliang He
Meina Kan
Shiguang Shan
Xilin Chen
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Jodi: Unification of Visual Generation and Understanding via Joint Modeling"
50 / 53 papers shown
Title
MMGen: Unified Multi-modal Image Generation and Understanding in One Go
Jiepeng Wang
Zhaoqing Wang
H. Pan
Yuan Liu
Dongdong Yu
Changhu Wang
Wenping Wang
DiffM
138
1
0
26 Mar 2025
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang
Qiuyu Huang
Junjie Liu
Xiefan Guo
Di Huang
111
7
0
24 Mar 2025
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Tsu-Jui Fu
Yusu Qian
Chen Chen
Wenze Hu
Zhe Gan
Yue Yang
186
2
0
16 Mar 2025
SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation
Ellie Arar
Yarden Frenkel
Daniel Cohen-Or
Ariel Shamir
Yael Vinker
DiffM
93
1
0
12 Feb 2025
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Xiaokang Chen
Zhiyu Wu
Xingchao Liu
Zizheng Pan
Wen Liu
Zhenda Xie
X. Yu
Chong Ruan
AI4TS
137
159
0
29 Jan 2025
One Diffusion to Generate Them All
Duong H. Le
Tuan Pham
Sangho Lee
Christopher Clark
Aniruddha Kembhavi
Stephan Mandt
Ranjay Krishna
Jiasen Lu
VLM
126
8
0
25 Nov 2024
CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
Yifeng Xu
Zhenliang He
Shiguang Shan
Xilin Chen
DiffM
54
6
0
12 Oct 2024
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Jing He
Haodong Li
Wei Yin
Yixun Liang
Leheng Li
Kaiqiang Zhou
Hongbo Zhang
Bingbing Liu
Ying-Cong Chen
DiffM
VLM
165
54
0
26 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
155
15
0
23 Sep 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo
Ruoyi Du
Han Xiao
Yangguang Li
Dongyang Liu
...
Wanli Ouyang
Ziwei Liu
Ping Luo
Hongsheng Li
Peng Gao
96
57
0
05 Jun 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
189
333
0
16 May 2024
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Zinan Guo
Yanze Wu
Zhuowei Chen
Lang Chen
Qian He
DiffM
86
65
0
24 Apr 2024
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
Siying Cui
Jia Guo
Xiang An
Jiankang Deng
Yongle Zhao
Xinyu Wei
Ziyong Feng
DiffM
81
24
0
20 Mar 2024
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Xiao Fu
Wei Yin
Mu Hu
Kaixuan Wang
Yuexin Ma
Ping Tan
Shaojie Shen
Dahua Lin
Xiaoxiao Long
DiffM
103
123
0
18 Mar 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
195
121
0
07 Mar 2024
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Qixun Wang
Xu Bai
Haofan Wang
Zekui Qin
Anthony Chen
Huaxia Li
Xu Tang
Feng-Long Xie
81
255
0
15 Jan 2024
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li
Mingdeng Cao
Xintao Wang
Zhongang Qi
Ming-Ming Cheng
Ying Shan
DiffM
99
200
0
07 Dec 2023
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Viraj Shah
Nataniel Ruiz
Forrester Cole
Erika Lu
Svetlana Lazebnik
Yuanzhen Li
Varun Jampani
DiffM
107
111
0
22 Nov 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao
Yongming Rao
Zuyan Liu
Benlin Liu
Jie Zhou
Jiwen Lu
ObjD
VLM
MDE
239
232
0
03 Mar 2023
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
Chong Mou
Xintao Wang
Liangbin Xie
Yanze Wu
Shuai Liu
Zhongang Qi
Ying Shan
Xiaohu Qie
DiffM
123
1,030
0
16 Feb 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
180
4,168
1
10 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
429
4,641
0
30 Jan 2023
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
106
2,386
0
19 Dec 2022
Multi-Concept Customization of Text-to-Image Diffusion
Nupur Kumari
Bin Zhang
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
160
874
0
08 Dec 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLM
MoE
177
828
0
02 Nov 2022
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
Cheng Lu
Yuhao Zhou
Fan Bao
Jianfei Chen
Chongxuan Li
Jun Zhu
DiffM
171
609
0
02 Nov 2022
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu
Chengyue Gong
Qiang Liu
OOD
195
1,043
0
07 Sep 2022
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz
Yuanzhen Li
Varun Jampani
Yael Pritch
Michael Rubinstein
Kfir Aberman
279
2,885
0
25 Aug 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
196
3,963
0
26 Jul 2022
Transformer Language Models without Positional Encodings Still Learn Positional Information
Adi Haviv
Ori Ram
Ofir Press
Peter Izsak
Omer Levy
97
127
0
30 Mar 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
123
249
0
12 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
474
15,734
0
20 Dec 2021
Pixel Difference Networks for Efficient Edge Detection
Z. Su
Wenzhe Liu
Zitong Yu
D. Hu
Qing Liao
Qi Tian
M. Pietikäinen
Li Liu
86
327
0
16 Aug 2021
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffM
SyDa
350
6,551
0
26 Nov 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
201
1,786
0
29 Jun 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
672
18,276
0
19 Jun 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
432
13,094
0
26 May 2020
DIODE: A Dense Indoor and Outdoor DEpth Dataset
Igor Vasiljevic
Nicholas I. Kolkin
Shanyi Zhang
Ruotian Luo
Haochen Wang
...
Andrea F. Daniele
Mohammadreza Mostajabi
Steven Basart
Matthew R. Walter
Gregory Shakhnarovich
MDE
3DV
80
233
0
01 Aug 2019
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
Stefano Ermon
SyDa
DiffM
258
3,954
0
12 Jul 2019
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
599
10,590
0
12 Dec 2018
Evaluation of CNN-based Single-Image Depth Estimation Methods
Tobias Koch
Lukas Liebel
Friedrich Fraundorfer
Marco Körner
3DV
143
150
0
03 May 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
379
11,877
0
11 Jan 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,199
0
12 Jun 2017
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
357
27,230
0
20 Mar 2017
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Angela Dai
Angel X. Chang
Manolis Savva
Maciej Halber
Thomas Funkhouser
Matthias Nießner
3DPC
3DV
487
4,077
0
14 Feb 2017
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
481
2,573
0
25 Jan 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,322
0
10 Dec 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.9K
77,341
0
18 May 2015
Holistically-Nested Edge Detection
Saining Xie
Zhuowen Tu
137
3,494
0
24 Apr 2015
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
SyDa
DiffM
306
7,005
0
12 Mar 2015
1
2
Next