ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.01390
  4. Cited By
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive
  Vision-Language Models

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

2 August 2023
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
Wanrong Zhu
Kalyani Marathe
Yonatan Bitton
S. Gadre
Shiori Sagawa
J. Jitsev
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
    MLLM
ArXivPDFHTML

Papers citing "OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models"

50 / 337 papers shown
Title
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal
  Models
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
Bin Fu
Qiyang Wan
Jialin Li
Ruiping Wang
Xilin Chen
50
0
0
03 Sep 2024
A Survey on Evaluation of Multimodal Large Language Models
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
50
20
0
28 Aug 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
Liwen Wang
Rong Jin
Tieniu Tan
OffRL
69
36
0
23 Aug 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
42
61
0
22 Aug 2024
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework
  for Multimodal Large Language Model
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
Chaoya Jiang
Jia Hongrui
Haiyang Xu
Wei Ye
Mengfan Dong
Ming Yan
Ji Zhang
Fei Huang
Shikun Zhang
VLM
48
1
0
22 Aug 2024
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual
  Integration in MLLMs
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
Yuanyang Yin
Yaqi Zhao
Yajie Zhang
Ke Lin
Jiahao Wang
Xin Tao
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
LRM
39
6
0
21 Aug 2024
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
Feipeng Ma
Yizhou Zhou
Hebei Li
Zilong He
Siying Wu
Fengyun Rao
Siying Wu
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
39
3
0
21 Aug 2024
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing
  Hallucinations in LVLMs
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
Yassine Ouali
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
VLM
MLLM
37
18
0
19 Aug 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
44
91
0
16 Aug 2024
BI-MDRG: Bridging Image History in Multimodal Dialogue Response
  Generation
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Hee Suk Yoon
Eunseop Yoon
Joshua Tian Jin Tee
Kang Zhang
Yu-Jung Heo
Du-Seong Chang
Chang D. Yoo
36
3
0
12 Aug 2024
Depth Helps: Improving Pre-trained RGB-based Policy with Depth
  Information Injection
Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection
Xincheng Pang
Wenke Xia
Zhigang Wang
Bin Zhao
Di Hu
Dong Wang
Xuelong Li
50
3
0
09 Aug 2024
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang
Jiequan Cui
Miaoge Li
Wang Lin
Bo Chen
Hanwang Zhang
MLLM
34
3
0
09 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
  Large Language Models
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
51
98
0
09 Aug 2024
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Yunfei Xie
Ce Zhou
Lang Gao
Juncheng Wu
Xianhang Li
...
Sheng Liu
Lei Xing
James Zou
Cihang Xie
Yuyin Zhou
LM&MA
MedIm
74
24
0
06 Aug 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large
  Vision-Language Models
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng
Jun Wang
Chuanhao Li
Quanfeng Lu
Hao Tian
...
Jifeng Dai
Ping Luo
Ping Luo
Kaipeng Zhang
Wenqi Shao
VLM
60
18
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
45
1
0
04 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance
  and Challenges Across Different Tasks
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
54
32
0
02 Aug 2024
Actra: Optimized Transformer Architecture for Vision-Language-Action
  Models in Robot Learning
Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning
Yueen Ma
Dafeng Chi
Shiguang Wu
Yuecheng Liu
Yuzheng Zhuang
Jianye Hao
Irwin King
44
5
0
02 Aug 2024
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models
  for Integrated Capabilities
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linfeng Ren
Linjie Li
Jianfeng Wang
K. Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
Xinchao Wang
VLM
MLLM
42
17
0
01 Aug 2024
Generalized Out-of-Distribution Detection and Beyond in Vision Language
  Model Era: A Survey
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Atsuyuki Miyai
Jingkang Yang
Jingyang Zhang
Yifei Ming
Sisir Dhakal
...
Yixuan Li
Hai "Helen" Li
Ziwei Liu
Toshihiko Yamasaki
Kiyoharu Aizawa
46
10
0
31 Jul 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
49
5
0
31 Jul 2024
LLAVADI: What Matters For Multimodal Large Language Models Distillation
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu
Xiangtai Li
Haobo Yuan
Lu Qi
Yunhai Tong
Ming-Hsuan Yang
36
3
0
28 Jul 2024
Data Processing Techniques for Modern Multimodal Models
Data Processing Techniques for Modern Multimodal Models
Yinheng Li
Han Ding
Hang Chen
VLM
36
0
0
27 Jul 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal
  Large Language Model
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Yiwei Ma
Zhibin Wang
Xiaoshuai Sun
Weihuang Lin
Qiang-feng Zhou
Jiayi Ji
Rongrong Ji
MLLM
VLM
57
1
0
23 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
45
9
0
22 Jul 2024
Learning from the Web: Language Drives Weakly-Supervised Incremental
  Learning for Semantic Segmentation
Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation
Chang Liu
Giulia Rizzoli
Pietro Zanuttigh
Fu Li
Yi Niu
CLL
53
1
0
18 Jul 2024
XEdgeAI: A Human-centered Industrial Inspection Framework with
  Data-centric Explainable Edge AI Approach
XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach
Truong Thanh Hung Nguyen
Phuc Truong Loc Nguyen
Hung Cao
32
3
0
16 Jul 2024
Towards Adversarially Robust Vision-Language Models: Insights from
  Design Choices and Prompt Formatting Techniques
Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques
Rishika Bhagwatkar
Shravan Nayak
Reza Bayat
Alexis Roger
Daniel Z Kaplan
P. Bashivan
Irina Rish
AAML
VLM
42
1
0
15 Jul 2024
Constructing Concept-based Models to Mitigate Spurious Correlations with
  Minimal Human Effort
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim
Ze Wang
Qiang Qiu
46
1
0
12 Jul 2024
MAVIS: Mathematical Visual Instruction Tuning
MAVIS: Mathematical Visual Instruction Tuning
Renrui Zhang
Xinyu Wei
Dongzhi Jiang
Yichi Zhang
Ziyu Guo
...
Aojun Zhou
Bin Wei
Shanghang Zhang
Peng Gao
Hongsheng Li
MLLM
42
25
0
11 Jul 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large
  Multimodal Models
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLM
VLM
52
196
0
10 Jul 2024
A Survey of Attacks on Large Vision-Language Models: Resources,
  Advances, and Future Trends
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
Daizong Liu
Mingyu Yang
Xiaoye Qu
Pan Zhou
Yu Cheng
Wei Hu
ELM
AAML
30
25
0
10 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
46
15
0
08 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
41
10
0
03 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
  Supporting Long-Contextual Input and Output
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
45
100
0
03 Jul 2024
VSP: Assessing the dual challenges of perception and reasoning in
  spatial planning tasks for VLMs
VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Qiucheng Wu
Handong Zhao
Michael Stephen Saxon
T. Bui
William Yang Wang
Yang Zhang
Shiyu Chang
CoGe
46
4
0
02 Jul 2024
RoboUniView: Visual-Language Model with Unified View Representation for
  Robotic Manipulaiton
RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton
Fanfan Liu
Feng Yan
Liming Zheng
Chengjian Feng
Yiyang Huang
Lin Ma
LM&Ro
35
11
0
27 Jun 2024
S3: A Simple Strong Sample-effective Multimodal Dialog System
S3: A Simple Strong Sample-effective Multimodal Dialog System
Elisei Rykov
Egor Malkershin
Alexander Panchenko
28
0
0
26 Jun 2024
Long Context Transfer from Language to Vision
Long Context Transfer from Language to Vision
Peiyuan Zhang
Kaichen Zhang
Bo Li
Guangtao Zeng
Jingkang Yang
Yuanhan Zhang
Ziyue Wang
Haoran Tan
Chunyuan Li
Ziwei Liu
VLM
72
141
0
24 Jun 2024
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
Gabriel H. Sarch
Lawrence Jang
Michael J. Tarr
William W. Cohen
Kenneth Marino
Katerina Fragkiadaki
LLMAG
56
0
0
20 Jun 2024
Learnable In-Context Vector for Visual Question Answering
Learnable In-Context Vector for Visual Question Answering
Yingzhe Peng
Chenduo Hao
Xu Yang
Jiawei Peng
Xinting Hu
Xin Geng
37
4
0
19 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human
  Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
51
26
0
16 Jun 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLM
VGen
47
6
0
15 Jun 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image
  Understanding
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang
Xingyu Fu
James Y. Huang
Zekun Li
Qin Liu
...
Kai-Wei Chang
Dan Roth
Sheng Zhang
Hoifung Poon
Muhao Chen
VLM
50
47
0
13 Jun 2024
Comparison Visual Instruction Tuning
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
46
4
0
13 Jun 2024
ReMI: A Dataset for Reasoning with Multiple Images
ReMI: A Dataset for Reasoning with Multiple Images
Mehran Kazemi
Nishanth Dikkala
Ankit Anand
Petar Dević
Ishita Dasgupta
...
Bahare Fatemi
Pranjal Awasthi
Dee Guo
Sreenivas Gollapudi
Ahmed Qureshi
LRM
VLM
52
13
0
13 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
39
3
0
13 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
Liwen Wang
Rong Jin
34
40
0
12 Jun 2024
Real2Code: Reconstruct Articulated Objects via Code Generation
Real2Code: Reconstruct Articulated Objects via Code Generation
Zhao Mandi
Yijia Weng
Dominik Bauer
Shuran Song
45
17
0
12 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
  Interleaved with Text
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLM
OffRL
56
21
0
12 Jun 2024
Previous
1234567
Next