ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.15232
  4. Cited By
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
v1v2v3v4 (latest)

DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception

24 May 2024
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
Ziqiang Liu
Lei Zhang
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
    VLMDiffM
ArXiv (abs)PDFHTML

Papers citing "DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception"

50 / 88 papers shown
Title
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Haonan Zhang
Run Luo
Xiong Liu
Yuchuan Wu
Ting-En Lin
...
Min Yang
Lianli Gao
Jingkuan Song
Fei Huang
Yongbin Li
AI4CE
86
0
0
26 May 2025
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Yiran Chen
Hao Peng
Tong Zhang
Heng Ji
VLM
70
0
0
13 May 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLMVLM
256
1
0
28 Apr 2025
Distilling Transitional Pattern to Large Language Models for Multimodal Session-based Recommendation
Distilling Transitional Pattern to Large Language Models for Multimodal Session-based Recommendation
Jiajie Su
Qiyong Zhong
Yunshan Ma
Weiming Liu
Chaochao Chen
Xiaolin Zheng
Yuxiang Cai
Tat-Seng Chua
77
0
0
13 Apr 2025
Learning to Instruct for Visual Instruction Tuning
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou
Feng Hong
Jiaan Luo
Jiangchao Yao
Dongsheng Li
Bo Han
Yize Zhang
Yanfeng Wang
VLM
114
1
0
28 Mar 2025
Continual Multimodal Contrastive Learning
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
223
2
0
19 Mar 2025
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
Weixiang Zhao
Yulin Hu
Yang Deng
Jiahe Guo
Xingyu Sui
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
175
7
0
28 Feb 2025
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
Run Luo
Ting-En Lin
Jun Wang
Yuchuan Wu
Xiong Liu
...
Lei Zhang
Yushen Chen
Xiaobo Xia
Hamid Alinejad-Rokny
Fei Huang
VLMAuLLM
148
0
0
08 Jan 2025
Towards Modality Generalization: A Benchmark and Prospective Analysis
Towards Modality Generalization: A Benchmark and Prospective Analysis
Xiaohao Liu
Xiaobo Xia
Zhuo Huang
See-Kiong Ng
Tat-Seng Chua
91
4
0
24 Dec 2024
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
Mingming Gong
Tongliang Liu
178
8
0
18 Nov 2024
IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking
IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking
Run Luo
Zikai Song
Longze Chen
Yunshui Li
Min Yang
Wei-Guo Yang
88
0
0
30 Oct 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
145
12
0
29 Aug 2024
Autogenic Language Embedding for Coherent Point Tracking
Autogenic Language Embedding for Coherent Point Tracking
Zikai Song
Ying Tang
Run Luo
Lintao Ma
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
135
4
0
30 Jul 2024
Towards Spoken Language Understanding via Multi-level Multi-grained
  Contrastive Learning
Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
Xuxin Cheng
Wanshi Xu
Zhihong Zhu
Hongxiang Li
Yuexian Zou
95
13
0
31 May 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
128
142
0
22 Apr 2024
Few-Shot Adversarial Prompt Learning on Vision-Language Models
Few-Shot Adversarial Prompt Learning on Vision-Language Models
Yiwei Zhou
Xiaobo Xia
Zhiwei Lin
Bo Han
Tongliang Liu
VLM
106
16
0
21 Mar 2024
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
Zhaoqing Wang
Xiaobo Xia
Ziye Chen
Xiao He
Yandong Guo
Biwei Huang
Tongliang Liu
VLM
98
13
0
14 Feb 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Hongsheng Li
Yu Qiao
Jifeng Dai
AuLLM
145
49
0
18 Jan 2024
Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLMLRM
155
291
0
20 Dec 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLMVLM
116
55
0
08 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLMDiffM
103
11
0
01 Nov 2023
IDEAL: Influence-Driven Selective Annotations Empower In-Context
  Learners in Large Language Models
IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models
Shaokun Zhang
Xiaobo Xia
Zhaoqing Wang
Ling-Hao Chen
Jiale Liu
Qingyun Wu
Tongliang Liu
81
21
0
16 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
217
2,829
0
05 Oct 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
101
199
0
20 Sep 2023
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction
  Tuning
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
L. Yu
Bowen Shi
Ramakanth Pasunuru
Benjamin Muller
O. Yu. Golovneva
...
Yaniv Taigman
Maryam Fazel-Zarandi
Asli Celikyilmaz
Luke Zettlemoyer
Armen Aghajanyan
MLLM
98
142
0
05 Sep 2023
DiffusionTrack: Diffusion Model For Multi-Object Tracking
DiffusionTrack: Diffusion Model For Multi-Object Tracking
Run Luo
Zikai Song
Lintao Ma
Ji Wei
Wei-Guo Yang
Min Yang
DiffM
113
30
0
19 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
128
720
0
04 Aug 2023
Planting a SEED of Vision in Large Language Model
Planting a SEED of Vision in Large Language Model
Yuying Ge
Yixiao Ge
Ziyun Zeng
Xintao Wang
Ying Shan
VLMMLLM
51
98
0
16 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
162
238
0
07 Jul 2023
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document
  Understanding
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Chenliang Li
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
VLMMLLM
87
128
0
04 Jul 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
121
652
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
123
765
0
26 Jun 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text
  Documents
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
159
246
0
21 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
507
4,451
0
09 Jun 2023
Diffusion Model for Dense Matching
Diffusion Model for Dense Matching
Jisu Nam
Gyuseong Lee
Sunwoo Kim
Ines Hyeonsu Kim
Hyoungwon Cho
Seyeong Kim
Seung Wook Kim
DiffM
80
10
0
30 May 2023
Generating Images with Multimodal Language Models
Generating Images with Multimodal Language Models
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
160
259
0
26 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLMLRM
331
815
0
17 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
165
2,099
0
11 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
167
2,074
0
20 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
577
4,936
0
17 Apr 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with
  Text
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
Wanrong Zhu
Jack Hessel
Anas Awadalla
S. Gadre
Jesse Dodge
Alex Fang
Youngjae Yu
Ludwig Schmidt
William Yang Wang
Yejin Choi
VLM
110
177
0
14 Apr 2023
Denoising Diffusion Autoencoders are Unified Self-supervised Learners
Denoising Diffusion Autoencoders are Unified Self-supervised Learners
Weilai Xiang
Hongyu Yang
Di Huang
Yunhong Wang
DiffM
122
78
0
17 Mar 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion
  Models
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Jiarui Xu
Sifei Liu
Arash Vahdat
Wonmin Byeon
Xiaolong Wang
Shalini De Mello
VLM
284
336
0
08 Mar 2023
Language Is Not All You Need: Aligning Perception with Language Models
Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLMLRMMLLM
132
566
0
27 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
442
4,664
0
30 Jan 2023
GRiT: A Generative Region-to-text Transformer for Object Understanding
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
ObjDVLM
73
119
0
01 Dec 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
209
3,514
0
16 Oct 2022
Imagen Video: High Definition Video Generation with Diffusion Models
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho
William Chan
Chitwan Saharia
Jay Whang
Ruiqi Gao
...
Diederik P. Kingma
Ben Poole
Mohammad Norouzi
David J. Fleet
Tim Salimans
VGen
179
1,547
0
05 Oct 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
211
1,134
0
22 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
74
555
0
03 Jun 2022
12
Next