ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.02239
  4. Cited By
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative
  Vokens

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

3 October 2023
Kaizhi Zheng
Xuehai He
Xin Wang
    MLLM
ArXivPDFHTML

Papers citing "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

25 / 75 papers shown
Title
STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
Yiqun Zhang
Fanheng Kong
Peidong Wang
Shuang Sun
Lingshuai Wang
Shi Feng
Daling Wang
Yifei Zhang
Kaisong Song
36
10
0
20 Jan 2024
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model
  Reasoning over Image Sequences
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
Xiyao Wang
Yuhang Zhou
Xiaoyu Liu
Hongjin Lu
Yuancheng Xu
...
Taixi Lu
Gedas Bertasius
Mohit Bansal
Huaxiu Yao
Furong Huang
LRM
VLM
99
65
0
19 Jan 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Hongsheng Li
Yu Qiao
Jifeng Dai
AuLLM
85
42
0
18 Jan 2024
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain
Jianwei Yang
Humphrey Shi
MLLM
26
24
0
21 Dec 2023
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRM
MLLM
AuLLM
54
45
0
30 Nov 2023
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large
  Language Models
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu
Yichen Zhu
Jindong Gu
Yunshi Lan
Chao Yang
Yu Qiao
30
84
0
29 Nov 2023
M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image
  Generation
M2^{2}2Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation
Xiaowei Chi
Rongyu Zhang
Zhengkai Jiang
Yijiang Liu
Ziyi Lin
...
Chaoyou Fu
Peng Gao
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
MLLM
33
1
0
29 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
67
21
0
28 Nov 2023
Robot Learning in the Era of Foundation Models: A Survey
Robot Learning in the Era of Foundation Models: A Survey
Xuan Xiao
Jiahang Liu
Zhipeng Wang
Yanmin Zhou
Yong Qi
Qian Cheng
Bin He
Shuo Jiang
AI4CE
LM&Ro
33
27
0
24 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models
  with Image and Video Understanding
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
36
226
0
14 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
71
4
0
10 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
47
3
0
08 Nov 2023
Emotion Detection for Misinformation: A Review
Emotion Detection for Misinformation: A Review
Zhiwei Liu
Tianlin Zhang
Kailai Yang
Paul Thompson
Zeping Yu
Sophia Ananiadou
23
28
0
01 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
23
54
0
31 Oct 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu
Zeqiang Lai
Zhangwei Gao
Erfei Cui
Ziheng Li
...
Lewei Lu
Qifeng Chen
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
128
30
0
26 Oct 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language
  Hallucination and Visual Illusion in Large Vision-Language Models
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLM
MLLM
42
156
0
23 Oct 2023
Teaching Text-to-Image Models to Communicate in Dialog
Teaching Text-to-Image Models to Communicate in Dialog
Xiaowen Sun
Jiazhan Feng
Yuxuan Wang
Yuxuan Lai
Xingyu Shen
Dongyan Zhao
DiffM
32
1
0
27 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the
  Wild
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
52
17
0
14 Sep 2023
LMEye: An Interactive Perception Network for Large Language Models
LMEye: An Interactive Perception Network for Large Language Models
Yunxin Li
Baotian Hu
Xinyu Chen
Lin Ma
Yong-mei Xu
Hao Fei
MLLM
VLM
33
24
0
05 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
299
4,261
0
30 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
521
0
02 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
345
12,003
0
04 Mar 2022
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language
  Modeling
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
194
385
0
06 Nov 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,796
0
24 Feb 2021
Previous
12