Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.09067
Cited By
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
17 March 2022
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua-Hong Wu
Haifeng Wang
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UNIMO-2: End-to-End Unified Vision-Language Grounded Learning"
14 / 14 papers shown
Title
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
Wei Li
Xue Xu
Jiachen Liu
Xinyan Xiao
25
5
0
24 Jan 2024
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding
Renqiu Xia
Bo-Wen Zhang
Hao Peng
Hancheng Ye
Xiangchao Yan
Peng Ye
Botian Shi
Yu Qiao
Junchi Yan
14
0
0
20 Sep 2023
Online Clustered Codebook
Chuanxia Zheng
Andrea Vedaldi
37
26
0
27 Jul 2023
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting
Zixin Guo
T. Wang
Selen Pehlivan
Abduljalil Radman
Jorma T. Laaksonen
VLM
25
2
0
14 Jul 2023
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Xiao Xu
Bei Li
Chenfei Wu
Shao-Yen Tseng
Anahita Bhiwandiwalla
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
AIFin
VLM
37
2
0
31 May 2023
UNIMO-3: Multi-granularity Interaction for Vision-Language Representation Learning
Hao Yang
Can Gao
Hao Liu
Xinyan Xiao
Yanyan Zhao
Bing Qin
25
2
0
23 May 2023
Multimodal Representation Learning of Cardiovascular Magnetic Resonance Imaging
Jielin Qiu
Peide Huang
Makiya Nakashima
Jae-Hyeok Lee
Jiacheng Zhu
...
Byung-Hak Kim
Debbie Kwon
Douglas Weber
Ding Zhao
David Chen
SSL
24
5
0
16 Apr 2023
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Yuxiao Chen
Jianbo Yuan
Yu Tian
Shijie Geng
Xinyu Li
Ding Zhou
Dimitris N. Metaxas
Hongxia Yang
14
33
0
27 Mar 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
23
4
0
19 Jan 2023
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
29
125
0
13 Dec 2022
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance
Wei Li
Xue Xu
Xinyan Xiao
Jiacheng Liu
Hu Yang
...
Zhanpeng Wang
Zhifan Feng
Qiaoqiao She
Yajuan Lyu
Hua-Hong Wu
121
29
0
28 Oct 2022
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Xiao Xu
Chenfei Wu
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
48
64
0
17 Jun 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
25
16
0
27 Mar 2022
1