ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.16527
  4. Cited By
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text
  Documents
v1v2 (latest)

OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

21 June 2023
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
Anton Lozhkov
Thomas Wang
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
ArXiv (abs)PDFHTML

Papers citing "OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents"

50 / 58 papers shown
Title
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
Yukang Feng
Jianwen Sun
Chuanhao Li
Zizhen Li
Jiaxin Ai
...
Yifan Chang
Sizhuo Zhou
Shenglin Zhang
Yu Dai
Kaipeng Zhang
MLLMEGVM
90
0
0
11 Jun 2025
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Danny Driess
Jost Tobias Springenberg
Brian Ichter
Lili Yu
Adrian Li-Bell
...
Allen Z. Ren
Homer Walke
Quan Vuong
Lucy Xiaoyang Shi
Sergey Levine
119
2
0
29 May 2025
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
Subba Reddy Oota
Akshett Rai Jindal
Ishani Mondal
Khushbu Pahwa
Satya Sai Srinath Namburi
Manish Shrivastava
M. Singh
Bapi S. Raju
Manish Gupta
51
1
0
26 May 2025
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Xuecheng Wu
Jiaxing Liu
Danlei Huang
Xiaoyu Li
Yifan Wang
Chen Chen
Liya Ma
Xuezhi Cao
Junxiao Xue
LRM
112
0
0
20 May 2025
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Pengfei Wang
Guohai Xu
Weinong Wang
Junjie Yang
Jie Lou
Yunhua Xue
106
0
0
15 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
319
1
0
05 May 2025
Mimic In-Context Learning for Multimodal Tasks
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang
Jiale Fu
Chenduo Hao
Xinting Hu
Yingzhe Peng
Xin Geng
Xu Yang
110
0
0
11 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Z. Huang
Zhe Chen
Zijia Zhao
Ziwei Chen
Zongyu Lin
MLLMVLMMoE
408
32
0
10 Apr 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Sara Sarto
Marcella Cornia
Rita Cucchiara
88
1
0
18 Mar 2025
Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks
Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks
Liming Lu
Shuchao Pang
Siyuan Liang
Haotian Zhu
Xiyu Zeng
Aishan Liu
Yunhuai Liu
Yongbin Zhou
AAML
174
5
0
05 Mar 2025
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
Hantao Lou
Changye Li
Yalan Qin
Yaodong Yang
122
1
0
22 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
165
9
0
21 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
561
0
0
04 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
184
23
0
28 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRLALMAI4TSVLMLRM
353
338
0
22 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
254
134
0
10 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
177
15
0
06 Jan 2025
Olympus: A Universal Task Router for Computer Vision Tasks
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip Torr
VLMObjD
548
1
0
12 Dec 2024
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
Ruben Ohana
Michael McCabe
Lucas Meyer
Rudy Morel
Fruzsina J. Agocs
...
François Rozet
Liam Parker
M. Cranmer
S. Ho
Shirley Ho
PINNAI4CE
193
23
1
30 Nov 2024
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Dianbo Sui
Qiang Liu
VLMCoGe
196
32
0
26 Nov 2024
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen
Chengyu Wang
Dakan Wang
Taolin Zhang
Wangyue Li
Xiaofeng He
KELM
156
1
0
23 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
145
93
1
15 Nov 2024
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Michael S Ryoo
Honglu Zhou
Shrikant B. Kendre
Can Qin
Le Xue
...
Kanchana Ranasinghe
Caiming Xiong
Ran Xu
Caiming Xiong
Juan Carlos Niebles
VGen
104
15
0
21 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
163
16
0
14 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
148
8
0
10 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
122
1
0
03 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Z. Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
112
7
0
02 Oct 2024
Referring Expression Generation in Visually Grounded Dialogue with
  Discourse-aware Comprehension Guiding
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding
Bram Willemsen
Gabriel Skantze
127
0
0
09 Sep 2024
An overview of domain-specific foundation model: key technologies, applications and challenges
An overview of domain-specific foundation model: key technologies, applications and challenges
Haolong Chen
Hanzhi Chen
Zijian Zhao
Kaifeng Han
Guangxu Zhu
Yichen Zhao
Ying Du
Wei Xu
Qingjiang Shi
ALMVLM
131
5
0
06 Sep 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
157
12
0
29 Aug 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
Liwen Wang
Rong Jin
Tieniu Tan
OffRL
176
52
0
23 Aug 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
108
96
0
16 Aug 2024
SEED-Story: Multimodal Long Story Generation with Large Language Model
SEED-Story: Multimodal Long Story Generation with Large Language Model
Shuai Yang
Yuying Ge
Yang Li
Yukang Chen
Yixiao Ge
Ying Shan
Yingcong Chen
VGenDiffM
146
32
0
11 Jul 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large
  Multimodal Models
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLMVLM
138
233
0
10 Jul 2024
Rethinking Visual Prompting for Multimodal Large Language Models with
  External Knowledge
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip Torr
Lu Yuan
LRMVLM
81
8
0
05 Jul 2024
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
Nan Xu
Fei Wang
Sheng Zhang
Hoifung Poon
Muhao Chen
141
7
0
01 Jul 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Xiangyu Zhao
Xiangtai Li
Haodong Duan
Haian Huang
Yining Li
Kai Chen
Hua Yang
VLMMLLM
120
12
0
25 Jun 2024
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal
  Documents
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Junjie Wang
Yin Zhang
Yatai Ji
Yuxiang Zhang
Chunyang Jiang
...
Bei Chen
Qunshu Lin
Minghao Liu
Ge Zhang
Wenhu Chen
97
3
0
20 Jun 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLMVGen
127
11
0
15 Jun 2024
First Multi-Dimensional Evaluation of Flowchart Comprehension for
  Multimodal Large Language Models
First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models
Enming Zhang
Ruobing Yao
Huanyong Liu
Junhui Yu
Jiale Wang
ELMLRM
97
0
0
14 Jun 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image
  Understanding
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang
Xingyu Fu
James Y. Huang
Zekun Li
Qin Liu
...
Kai-Wei Chang
Dan Roth
Sheng Zhang
Hoifung Poon
Muhao Chen
VLM
133
59
0
13 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
169
3
0
13 Jun 2024
Needle In A Multimodal Haystack
Needle In A Multimodal Haystack
Weiyun Wang
Shuibo Zhang
Yiming Ren
Yuchen Duan
Tiantong Li
...
Ping Luo
Yu Qiao
Jifeng Dai
Wenqi Shao
Wenhai Wang
VLM
118
24
0
11 Jun 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Dinesh Manocha
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
203
38
0
24 May 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLMDiffM
188
22
0
24 May 2024
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
119
58
0
17 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
212
338
0
16 May 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
118
45
0
24 Apr 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
  Large Vision-Language Models
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
131
58
0
01 Mar 2024
A Surprising Failure? Multimodal LLMs and the NLVR Challenge
A Surprising Failure? Multimodal LLMs and the NLVR Challenge
Anne Wu
Kianté Brantley
Yoav Artzi
40
5
0
26 Feb 2024
12
Next