HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
v1v2 (latest)

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

Papers citing "HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding"

50 / 99 papers shown
Title
Emu3: Next-Token Prediction is All You Need
Emu3: Next-Token Prediction is All You Need
Xinlong Wang
Xiaosong Zhang
Zhengxiong Luo
Quan-Sen Sun
Yufeng Cui
...
Xi Yang
Jingjing Liu
Yonghua Lin
Tiejun Huang
Zhongyuan Wang
132
277
0
27 Sep 2024

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.