ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.04671
  4. Cited By
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation
  Models

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

8 March 2023
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
    MLLM
    LRM
ArXivPDFHTML

Papers citing "Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models"

50 / 111 papers shown
Title
IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models
IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models
Zhihao Chen
Bin Hu
Chuang Niu
Tao Chen
Yuxin Li
Hongming Shan
Ge Wang
LM&MA
MLLM
24
4
0
25 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
48
29
0
19 Dec 2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
  Planning States for Autonomous Driving
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang
Jiangwei Xie
ChuanYang Hu
Haoming Zou
Jianan Fan
...
Lewei Lu
Xizhou Zhu
Xiaogang Wang
Yu Qiao
Jifeng Dai
36
124
0
14 Dec 2023
NLLG Quarterly arXiv Report 09/23: What are the most influential current
  AI Papers?
NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Ran Zhang
Aida Kostikova
Christoph Leiter
Jonas Belouadi
Daniil Larionov
Yanran Chen
Vivian Fresen
Steffen Eger
39
0
0
09 Dec 2023
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
40
259
0
28 Nov 2023
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan
Jinke Ren
Chun-Mei Feng
Hengshuang Zhao
Shuguang Cui
Zhen Li
39
26
0
26 Nov 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger
  Models with Self-Consistency Training
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan
Jingxuan Wei
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Ruifeng Guo
Xihong Yang
Stan Z. Li
LRM
31
7
0
23 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
43
251
0
21 Nov 2023
MMC: Advancing Multimodal Chart Understanding with Large-scale
  Instruction Tuning
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
Fuxiao Liu
Xiaoyang Wang
Wenlin Yao
Jianshu Chen
Kaiqiang Song
Sangwoo Cho
Yaser Yacoob
Dong Yu
24
99
0
15 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
56
105
0
09 Nov 2023
MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with
  Large Language Model
MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model
Le Zhang
Yihong Wu
Fengran Mo
Jian-Yun Nie
Aishwarya Agrawal
MLLM
RALM
34
6
0
20 Oct 2023
ToolChain*: Efficient Action Space Navigation in Large Language Models
  with A* Search
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Yuchen Zhuang
Xiang Chen
Tong Yu
Saayan Mitra
Victor S. Bursztyn
Ryan A. Rossi
Somdeb Sarkhel
Chao Zhang
LLMAG
36
53
0
20 Oct 2023
Towards Robust Multi-Modal Reasoning via Model Selection
Towards Robust Multi-Modal Reasoning via Model Selection
Xiangyan Liu
Rongxue Li
Wei Ji
Tao Lin
LLMAG
LRM
37
3
0
12 Oct 2023
Toolink: Linking Toolkit Creation and Using through Chain-of-Solving on
  Open-Source Model
Toolink: Linking Toolkit Creation and Using through Chain-of-Solving on Open-Source Model
Cheng Qian
Chenyan Xiong
Zhenghao Liu
Zhiyuan Liu
LRM
29
12
0
08 Oct 2023
Large Language Model (LLM) as a System of Multiple Expert Agents: An
  Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge
Large Language Model (LLM) as a System of Multiple Expert Agents: An Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge
J. Tan
Mehul Motani
LLMAG
44
8
0
08 Oct 2023
TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation
  Models
TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation Models
Siyao Zhang
Daocheng Fu
Zhao Zhang
Bin Yu
Pinlong Cai
16
47
0
13 Sep 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language
  Models
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAG
SyDa
46
41
0
01 Aug 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
  APIs
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin
Shi Liang
Yining Ye
Kunlun Zhu
Lan Yan
...
Jie Zhou
Mark B. Gerstein
Dahai Li
Zhiyuan Liu
Maosong Sun
CLL
ALM
LLMAG
ELM
LM&MA
81
621
0
31 Jul 2023
Testing the Depth of ChatGPT's Comprehension via Cross-Modal Tasks Based
  on ASCII-Art: GPT3.5's Abilities in Regard to Recognizing and Generating
  ASCII-Art Are Not Totally Lacking
Testing the Depth of ChatGPT's Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5's Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking
David Bayani
MLLM
36
5
0
28 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
118
0
25 Jul 2023
Fashion Matrix: Editing Photos by Just Talking
Fashion Matrix: Editing Photos by Just Talking
Zheng Chong
Xujie Zhang
Fuwei Zhao
Zhenyu Xie
Xiaodan Liang
DiffM
21
2
0
25 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
44
106
0
17 Jul 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
29
6
0
17 Jul 2023
GeoGPT: Understanding and Processing Geospatial Tasks through An
  Autonomous GPT
GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT
Yifan Zhang
Cheng Wei
Shangyou Wu
Zhengting He
Wenhao Yu
41
28
0
16 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
85
224
0
07 Jul 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
33
7
0
14 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Zhongang Qi
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
31
6
0
12 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
45
190
0
12 Jun 2023
Contextual Object Detection with Multimodal Large Language Models
Contextual Object Detection with Multimodal Large Language Models
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
32
78
0
29 May 2023
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large
  Language Models
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Gengze Zhou
Yicong Hong
Qi Wu
ELM
LM&Ro
LLMAG
LRM
25
142
0
26 May 2023
On the Tool Manipulation Capability of Open-source Large Language Models
On the Tool Manipulation Capability of Open-source Large Language Models
Qiantong Xu
Fenglu Hong
Yangqiu Song
Changran Hu
Zheng Chen
Jian Zhang
LLMAG
29
69
0
25 May 2023
Visual Programming for Text-to-Image Generation and Evaluation
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
MLLM
29
50
0
24 May 2023
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space
  Manipulation
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation
Dongxu Yue
Qin Guo
Munan Ning
Jiaxi Cui
Yuesheng Zhu
Liuliang Yuan
DiffM
29
11
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
52
43
0
24 May 2023
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image
  Diffusion Models with Large Language Models
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
Long Lian
Boyi Li
Adam Yala
Trevor Darrell
43
152
0
23 May 2023
Making Language Models Better Tool Learners with Execution Feedback
Making Language Models Better Tool Learners with Execution Feedback
Shuofei Qiao
Honghao Gui
Chengfei Lv
Qianghuai Jia
Huajun Chen
Ningyu Zhang
LLMAG
43
46
0
22 May 2023
Examining Inter-Consistency of Large Language Models Collaboration: An
  In-depth Analysis via Debate
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate
Kai Xiong
Xiao Ding
Yixin Cao
Ting Liu
Bing Qin
21
59
0
19 May 2023
Empower Large Language Model to Perform Better on Industrial
  Domain-Specific Question Answering
Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering
Fangkai Yang
Pu Zhao
Zezhong Wang
Lu Wang
Jue Zhang
Mohit Garg
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
37
47
0
19 May 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
  with Large Language Model
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Siyuan Huang
Zhengkai Jiang
Hao Dong
Yu Qiao
Peng Gao
Hongsheng Li
LM&Ro
27
93
0
18 May 2023
Small Models are Valuable Plug-ins for Large Language Models
Small Models are Valuable Plug-ins for Large Language Models
Canwen Xu
Yichong Xu
Shuohang Wang
Yang Liu
Chenguang Zhu
Julian McAuley
LLMAG
41
45
0
15 May 2023
Augmented Large Language Models with Parametric Knowledge Guiding
Augmented Large Language Models with Parametric Knowledge Guiding
Ziyang Luo
Can Xu
Pu Zhao
Xiubo Geng
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
KELM
RALM
40
44
0
08 May 2023
The Potential of Visual ChatGPT For Remote Sensing
The Potential of Visual ChatGPT For Remote Sensing
L. Osco
Eduardo Lopes de Lemos
W. Gonçalves
A. P. Ramos
J. M. Junior
25
30
0
25 Apr 2023
ChatABL: Abductive Learning via Natural Language Interaction with
  ChatGPT
ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT
Tianyang Zhong
Yaonai Wei
Li Yang
Zihao Wu
Zheng Liu
...
Xi Jiang
Jun-Feng Han
Dinggang Shen
Tianming Liu
Tuo Zhang
LRM
19
27
0
21 Apr 2023
OpenAGI: When LLM Meets Domain Experts
OpenAGI: When LLM Meets Domain Experts
Yingqiang Ge
Wenyue Hua
Kai Mei
Jianchao Ji
Juntao Tan
Shuyuan Xu
Zelong Li
Yongfeng Zhang
VLM
LRM
38
211
0
10 Apr 2023
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Jun Chen
Deyao Zhu
Kilichbek Haydarov
Xiang Li
Mohamed Elhoseiny
25
37
0
09 Apr 2023
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language
  Models
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
Emilio Ferrara
SILM
36
247
0
07 Apr 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
45
431
0
14 Mar 2023
Reasoning with Language Model Prompting: A Survey
Reasoning with Language Model Prompting: A Survey
Shuofei Qiao
Yixin Ou
Ningyu Zhang
Xiang Chen
Yunzhi Yao
Shumin Deng
Chuanqi Tan
Fei Huang
Huajun Chen
ReLM
ELM
LRM
71
311
0
19 Dec 2022
Decoding Visual Neural Representations by Multimodal Learning of
  Brain-Visual-Linguistic Features
Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features
Changde Du
Kaicheng Fu
Jinpeng Li
Huiguang He
VLM
45
68
0
13 Oct 2022
Previous
123
Next