ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.02471
  4. Cited By
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

5 May 2025
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
Jianxin Sun
Junbo Zhao
Jun Zhou
Kaixiang Ji
Lixiang Ru
L. xilinx Wang
Qingpei Guo
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
    MLLM
ArXivPDFHTML

Papers citing "Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction"

45 / 45 papers shown
Title
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
247
0
0
05 May 2025
Transfer between Modalities with MetaQueries
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
78
17
0
08 Apr 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
155
4
0
26 Feb 2025
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Xiaokang Chen
Zhiyu Wu
Xingchao Liu
Zizheng Pan
Wen Liu
Zhenda Xie
X. Yu
Chong Ruan
AI4TS
130
139
0
29 Jan 2025
MetaMorph: Multimodal Understanding and Generation via Instruction
  Tuning
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong
David Fan
Jiachen Zhu
Yunyang Xiong
Xinlei Chen
Koustuv Sinha
Michael G. Rabbat
Yann LeCun
Saining Xie
Zhuang Liu
VLM
107
44
0
18 Dec 2024
MotionStone: Decoupled Motion Intensity Modulation with Diffusion
  Transformer for Image-to-Video Generation
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi
Biao Gong
Xi Chen
Dandan Zheng
Shuai Tan
...
Jingwen He
Kecheng Zheng
Jingdong Chen
Ming-Hsuan Yang
Yinqiang Zheng
VGen
DiffM
74
4
0
08 Dec 2024
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Shuai Tan
Biao Gong
Yutong Feng
Kecheng Zheng
Dandan Zheng
Shuwei Shi
Yujun Shen
Jingdong Chen
Ming-Hsuan Yang
VGen
104
4
0
04 Dec 2024
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and
  Generation
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Liao Qu
Huichao Zhang
Yiheng Liu
Xinyu Wang
Yi Jiang
Yiming Gao
Hu Ye
Daniel K. Du
Zehuan Yuan
Xinglong Wu
120
33
0
04 Dec 2024
Framer: Interactive Frame Interpolation
Framer: Interactive Frame Interpolation
Wen Wang
Qiuyu Wang
Kecheng Zheng
Hao Ouyang
Zhekai Chen
Biao Gong
Hao Chen
Yujun Shen
Chunhua Shen
VGen
84
6
0
24 Oct 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
  and Generation
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu
Xiaokang Chen
Z. F. Wu
Yiyang Ma
Xingchao Liu
...
Wen Liu
Zhenda Xie
Xingkai Yu
Chong Ruan
Ping Luo
AI4TS
110
102
0
17 Oct 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion
  Transformers
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie
Junsong Chen
Junyu Chen
Han Cai
Haotian Tang
...
Zhekai Zhang
Zhekai Zhang
Ligeng Zhu
Yaojie Lu
Song Han
VLM
87
75
0
14 Oct 2024
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
MLLM
98
208
0
27 Sep 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
90
204
0
22 Aug 2024
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Haozhe Zhao
Xiaojian Ma
Liang Chen
Shuzheng Si
Rujie Wu
Kaikai An
Peiyu Yu
Minjia Zhang
Qing Li
Baobao Chang
85
57
0
07 Jul 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
96
274
0
10 Jun 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
174
307
0
16 May 2024
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional
  Image Editing
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Yuying Ge
Sijie Zhao
Chen Li
Yixiao Ge
Ying Shan
57
32
0
07 May 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
98
131
0
22 Apr 2024
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Mude Hui
Siwei Yang
Bingchen Zhao
Yichun Shi
Heng Wang
Peng Wang
Yuyin Zhou
Cihang Xie
67
67
0
15 Apr 2024
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics
  Perception
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
Yipo Huang
Xiangfei Sheng
Zhichao Yang
Quan Yuan
Zhichao Duan
Pengfei Chen
Leida Li
Weisi Lin
Guangming Shi
71
25
0
15 Apr 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Jiaqi Wang
Yu Qiao
Dahua Lin
Feng Zhao
VLM
102
274
0
29 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
88
352
0
08 Mar 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
272
1,305
0
05 Mar 2024
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng
Biao Gong
Di Chen
Yujun Shen
Yu Liu
Jingren Zhou
DiffM
60
47
0
28 Nov 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
210
901
0
27 Nov 2023
Learning Disentangled Identifiers for Action-Customized Text-to-Image
  Generation
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
Siteng Huang
Biao Gong
Yutong Feng
Xi Chen
Yu Fu
Yu Liu
Donglin Wang
DiffM
48
14
0
27 Nov 2023
Check, Locate, Rectify: A Training-Free Layout Calibration System for
  Text-to-Image Generation
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Biao Gong
Siteng Huang
Yutong Feng
Shiwei Zhang
Yuyuan Li
Yu Liu
DiffM
75
13
0
27 Nov 2023
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Ruoxi Shi
Hansheng Chen
Zhuoyang Zhang
Minghua Liu
Chao Xu
Xinyue Wei
Linghao Chen
Chong Zeng
Hao Su
VLM
58
364
0
23 Oct 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language
  Hallucination and Visual Illusion in Large Vision-Language Models
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLM
MLLM
92
180
0
23 Oct 2023
GenEval: An Object-Focused Framework for Evaluating Text-to-Image
  Alignment
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment
Dhruba Ghosh
Hanna Hajishirzi
Ludwig Schmidt
80
180
0
17 Oct 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
107
614
0
03 Oct 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
68
193
0
20 Sep 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
100
684
0
04 Aug 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
150
115
0
12 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
95
1,015
0
12 Jul 2023
SDXL: Improving Latent Diffusion Models for High-Resolution Image
  Synthesis
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell
Zion English
Kyle Lacey
A. Blattmann
Tim Dockhorn
Jonas Muller
Joe Penna
Robin Rombach
214
2,356
0
04 Jul 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
  Editing
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang
Lingbo Mo
Wenhu Chen
Huan Sun
Yu-Chuan Su
EGVM
166
264
0
16 Jun 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.5K
13,182
0
27 Feb 2023
InstructPix2Pix: Learning to Follow Image Editing Instructions
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
198
1,796
0
17 Nov 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
170
3,444
0
16 Oct 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
377
6,859
0
13 Apr 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training
  Benchmark
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
86
93
0
14 Feb 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
410
15,486
0
20 Dec 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
268
2,443
0
20 Apr 2021
A Diagram Is Worth A Dozen Images
A Diagram Is Worth A Dozen Images
Aniruddha Kembhavi
M. Salvato
Eric Kolve
Minjoon Seo
Hannaneh Hajishirzi
Ali Farhadi
3DV
56
482
0
24 Mar 2016
1