ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.10020
  4. Cited By
Multimodal Foundation Models: From Specialists to General-Purpose
  Assistants

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

18 September 2023
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
    MLLM
ArXivPDFHTML

Papers citing "Multimodal Foundation Models: From Specialists to General-Purpose Assistants"

33 / 33 papers shown
Title
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
68
0
0
06 May 2025
DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning
DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning
Borui Wang
Kathleen McKeown
Rex Ying
OffRL
39
0
0
06 May 2025
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
55
0
0
05 May 2025
Position: Foundation Models Need Digital Twin Representations
Position: Foundation Models Need Digital Twin Representations
Yiqing Shen
Hao Ding
Lalithkumar Seenivasan
Tianmin Shu
Mathias Unberath
AI4CE
40
0
0
01 May 2025
V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models
V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models
Xiangxi Zheng
Linjie Li
Z. Yang
Ping Yu
Alex Jinpeng Wang
Rui Yan
Yuan Yao
Lijuan Wang
LRM
21
0
0
08 Apr 2025
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Z. Wang
Yurui Dong
Fuwen Luo
Minyuan Ruan
Zhili Cheng
C. L. P. Chen
Peng Li
Yang Liu
LRM
87
0
0
13 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
143
0
0
05 Mar 2025
Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
Michael Xieyang Liu
S. Petridis
Vivian Tsai
Alexander J. Fiannaca
Alex Olwal
Michael Terry
Carrie J. Cai
LRM
37
1
0
28 Jan 2025
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform
Cheonsu Jeong
75
0
0
01 Jan 2025
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
70
0
0
20 Nov 2024
An Intelligent Agentic System for Complex Image Restoration Problems
An Intelligent Agentic System for Complex Image Restoration Problems
Kaiwen Zhu
Jinjin Gu
Zhiyuan You
Yu Qiao
Chao Dong
33
6
0
23 Oct 2024
Deep Correlated Prompting for Visual Recognition with Missing Modalities
Deep Correlated Prompting for Visual Recognition with Missing Modalities
Lianyu Hu
Tongkai Shi
Wei Feng
Fanhua Shang
Liang Wan
VLM
29
1
0
09 Oct 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
27
53
0
28 Aug 2024
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models
Xiaochen Wang
Jiaqi Wang
Houping Xiao
J. Chen
Fenglong Ma
MedIm
61
7
0
17 Aug 2024
Enhancing Representation Learning of EEG Data with Masked Autoencoders
Enhancing Representation Learning of EEG Data with Masked Autoencoders
Yifei Zhou
Sitong Liu
39
0
0
09 Aug 2024
Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
Jinfa Huang
Jinsheng Pan
Zhongwei Wan
Hanjia Lyu
Jiebo Luo
55
4
0
30 Jul 2024
From Words to Actions: Unveiling the Theoretical Underpinnings of
  LLM-Driven Autonomous Systems
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
Jianliang He
Siyu Chen
Fengzhuo Zhang
Zhuoran Yang
LM&Ro
LLMAG
40
2
0
30 May 2024
How Culturally Aware are Vision-Language Models?
How Culturally Aware are Vision-Language Models?
Olena Burda-Lassen
Aman Chadha
Shashank Goswami
Vinija Jain
VLM
39
0
0
24 May 2024
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
Simon Damm
M. Laszkiewicz
Johannes Lederer
Asja Fischer
48
3
0
23 May 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
K. Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
34
12
0
25 Apr 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You
Haotian Zhang
E. Schoop
Floris Weers
Amanda Swearngin
Jeffrey Nichols
Yinfei Yang
Zhe Gan
MLLM
45
82
0
08 Apr 2024
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact
  Language Model
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Musashi Hinck
M. L. Olson
David Cobbley
Shao-Yen Tseng
Vasudev Lal
VLM
32
10
0
29 Mar 2024
Levels of AI Agents: from Rules to Large Language Models
Levels of AI Agents: from Rules to Large Language Models
Yu Huang
AI4CE
ELM
LM&Ro
43
2
0
06 Mar 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
Generative AI and Process Systems Engineering: The Next Frontier
Generative AI and Process Systems Engineering: The Next Frontier
Benjamin Decardi-Nelson
Abdulelah S. Alshehri
Akshay Ajagekar
Fengqi You
AI4CE
LLMAG
24
24
0
15 Feb 2024
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering
Xiaopeng Li
Shasha Li
Shezheng Song
Huijun Liu
Bing Ji
...
Jun Ma
Jie Yu
Xiaodong Liu
Jing Wang
Weimin Zhang
KELM
37
4
0
31 Jan 2024
MLLMReID: Multimodal Large Language Model-based Person Re-identification
MLLMReID: Multimodal Large Language Model-based Person Re-identification
Shan Yang
Yongfei Zhang
LRM
21
2
0
24 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
X. Li
Luisa Verdoliva
Shu Hu
86
56
0
22 Jan 2024
Uni3DL: Unified Model for 3D and Language Understanding
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
30
3
0
05 Dec 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
56
104
0
09 Nov 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
61
22
0
02 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
26
63
0
30 Oct 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas L. Griffiths
LLMAG
LM&Ro
42
151
0
05 Sep 2023
1