ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.00332
  4. Cited By
UC2: Universal Cross-lingual Cross-modal Vision-and-Language
  Pre-training

UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

1 April 2021
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
    MLLMVLM
ArXiv (abs)PDFHTML

Papers citing "UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training"

50 / 56 papers shown
Title
A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Qing Zhou
Tao Yang
Junyu Gao
W. Ni
Junzheng Wu
Qi Wang
80
0
0
06 Mar 2025
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
127
9
0
16 Oct 2024
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Yabing Wang
Le Wang
Qiang-feng Zhou
Zhibin Wang
Hao Li
Gang Hua
Wei Tang
90
10
0
30 Sep 2024
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with
  1-to-K Contrastive Learning
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Zhijie Nie
Richong Zhang
Zhangchi Feng
Hailang Huang
Xudong Liu
116
3
0
26 Jun 2024
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram
Elias Stengel-Eskin
Carl Vondrick
Joey Tianyi Zhou
VLM
149
1
0
17 Jun 2024
Translation Deserves Better: Analyzing Translation Artifacts in
  Cross-lingual Visual Question Answering
Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering
Yujin Baek
Koanho Lee
Hyesu Lim
Jaeseok Kim
Junmo Park
Yu-Jung Heo
Du-Seong Chang
Jaegul Choo
51
3
0
04 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Yangfu Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
173
12
0
04 Jun 2024
Embracing Language Inclusivity and Diversity in CLIP through Continual
  Language Learning
Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning
Bang-ju Yang
Yong Dai
Xuxin Cheng
Yaowei Li
Asif Raza
Yuexian Zou
VLM
81
5
0
30 Jan 2024
CLIP in Medical Imaging: A Comprehensive Survey
CLIP in Medical Imaging: A Comprehensive Survey
Zihao Zhao
Yuxiao Liu
Han Wu
Yonghao Li
Sheng Wang
L. Teng
Disheng Liu
Zhiming Cui
Qian Wang
Dinggang Shen
CLIPMedImLM&MAVLM
158
43
0
12 Dec 2023
ICU: Conquering Language Barriers in Vision-and-Language Modeling by
  Dividing the Tasks into Image Captioning and Language Understanding
ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding
Guojun Wu
VLMMLLM
70
1
0
19 Oct 2023
Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal
  Retrieval
Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval
Yabing Wang
Shuhui Wang
Hao Luo
Jianfeng Dong
F. Wang
Meng Han
Xun Wang
Meng Wang
84
9
0
11 Sep 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLMVLM
120
56
0
23 Aug 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image
  Captioning
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
69
3
0
19 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLMMLLM
123
32
0
13 Jul 2023
Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
Yasmine Karoui
R. Lebret
Negar Foroutan
Karl Aberer
MLLMVLM
67
2
0
29 Jun 2023
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language
  Representations
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Gregor Geigle
Radu Timofte
Goran Glavaš
VLMMLLM
71
5
0
14 Jun 2023
Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by
  Diminishing Bias
Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias
Zhongwei Wan
Che Liu
Mi Zhang
Jie Fu
Benyou Wang
Sibo Cheng
Lei Ma
César Quilodrán-Casas
Rossella Arcucci
115
77
0
31 May 2023
Translation-Enhanced Multilingual Text-to-Image Generation
Translation-Enhanced Multilingual Text-to-Image Generation
Yaoyiran Li
Ching-Yun Chang
Stephen Rawls
Ivan Vulić
Anna Korhonen
68
8
0
30 May 2023
Meta-learning For Vision-and-language Cross-lingual Transfer
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu
Frank Keller
VLM
92
2
0
24 May 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou
Yunlong Liang
Fandong Meng
Jinan Xu
Jinsong Su
Jie Zhou
VLM
71
4
0
13 May 2023
Accessible Instruction-Following Agent
Accessible Instruction-Following Agent
Kairui Zhou
93
1
0
08 May 2023
LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization
LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization
Sheng Liu
C. P. Huynh
Congmin Chen
Maxim Arap
Raffay Hamid
114
19
0
25 Apr 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
189
48
0
21 Mar 2023
Entity-Level Text-Guided Image Manipulation
Entity-Level Text-Guided Image Manipulation
Yikai Wang
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Wei Zhang
Yanwei Fu
VGen
78
3
0
22 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
109
32
0
01 Feb 2023
Tackling Ambiguity with Images: Improved Multimodal Machine Translation
  and Contrastive Evaluation
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
Matthieu Futeral
Cordelia Schmid
Ivan Laptev
Benoît Sagot
Rachel Bawden
108
31
0
20 Dec 2022
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
Zutao Jiang
Guangsong Lu
Xiaodan Liang
Jihua Zhu
Wei Zhang
Xiaojun Chang
Hang Xu
DiffM
79
8
0
02 Dec 2022
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X2^22-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
Yan Zeng
Xinsong Zhang
Hang Li
Jiawei Wang
Jipeng Zhang
Hkust Wangchunshu Zhou
VLMMLLM
75
15
0
22 Nov 2022
AltCLIP: Altering the Language Encoder in CLIP for Extended Language
  Capabilities
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Zhongzhi Chen
Guangyi Liu
Bo Zhang
Fulong Ye
Qinghong Yang
Ledell Yu Wu
VLM
102
90
0
12 Nov 2022
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for
  Understanding and Generation
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation
Bin Shan
Yaqian Han
Weichong Yin
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
MLLMVLM
90
8
0
09 Nov 2022
Multilingual Multimodality: A Taxonomical Survey of Datasets,
  Techniques, Challenges and Opportunities
Multilingual Multimodality: A Taxonomical Survey of Datasets, Techniques, Challenges and Opportunities
Khyathi Chandu
A. Geramifard
76
3
0
30 Oct 2022
Multilingual Multimodal Learning with Machine Translated Text
Multilingual Multimodal Learning with Machine Translated Text
Chen Qiu
Dan Oneaţă
Emanuele Bugliarello
Stella Frank
Desmond Elliott
123
15
0
24 Oct 2022
Plausible May Not Be Faithful: Probing Object Hallucination in
  Vision-Language Pre-training
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Wenliang Dai
Zihan Liu
Ziwei Ji
Jane Polak Scowcroft
Pascale Fung
MLLMVLM
101
67
0
14 Oct 2022
Visual Grounding of Inter-lingual Word-Embeddings
Visual Grounding of Inter-lingual Word-Embeddings
W. Mohammed
Hassan Shahmohammadi
Hendrik P. A. Lensch
R. Baayen
68
1
0
08 Sep 2022
Improving the Cross-Lingual Generalisation in Visual Question Answering
Improving the Cross-Lingual Generalisation in Visual Question Answering
Farhad Nooralahzadeh
Rico Sennrich
104
6
0
07 Sep 2022
Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning
Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning
Yabing Wang
Jianfeng Dong
Tianxiang Liang
Minsong Zhang
Rui Cai
Xun Wang
97
20
0
26 Aug 2022
MuMUR : Multilingual Multimodal Universal Retrieval
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
166
3
0
24 Aug 2022
Vision-and-Language Pretraining
Vision-and-Language Pretraining
Thong Nguyen
Cong-Duy Nguyen
Xiaobao Wu
See-Kiong Ng
Anh Tuan Luu
VLMCLIP
74
2
0
05 Jul 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
Peng Xu
Xiatian Zhu
David Clifton
ViT
249
579
0
13 Jun 2022
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge
  Distillation
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Kshitij Gupta
Devansh Gautam
R. Mamidi
VLM
93
4
0
07 Jun 2022
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal
  Pre-training
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
110
32
0
01 Jun 2022
Generalizing Multimodal Pre-training into Multilingual via Language
  Acquisition
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition
Liang Zhang
Anwen Hu
Qin Jin
VLM
57
5
0
29 May 2022
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise
  Semantic Alignment and Generation
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Chunjing Xu
Yanwei Fu
114
17
0
09 Apr 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based
  Multi-Granular Alignment
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
82
31
0
01 Mar 2022
Delving Deeper into Cross-lingual Visual Question Answering
Delving Deeper into Cross-lingual Visual Question Answering
Chen Cecilia Liu
Jonas Pfeiffer
Anna Korhonen
Ivan Vulić
Iryna Gurevych
111
9
0
15 Feb 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training
  Benchmark
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
187
95
0
14 Feb 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
  Languages
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
Edoardo Ponti
Ivan Vulić
MLLMVLMELM
135
64
0
27 Jan 2022
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
160
221
0
24 Nov 2021
Class-agnostic Object Detection with Multi-modal Transformer
Class-agnostic Object Detection with Multi-modal Transformer
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
Rao Muhammad Anwer
Ming-Hsuan Yang
175
97
0
22 Nov 2021
xGQA: Cross-Lingual Visual Question Answering
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
124
62
0
13 Sep 2021
12
Next