ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.00242
  4. Cited By
Unsupervised Vision-and-Language Pre-training via Retrieval-based
  Multi-Granular Alignment

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

1 March 2022
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
    VLM
ArXivPDFHTML

Papers citing "Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment"

23 / 23 papers shown
Title
Multi-Agents Based on Large Language Models for Knowledge-based Visual
  Question Answering
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Zhenqi Wang
50
0
0
24 Dec 2024
Prompting Large Language Models with Rationale Heuristics for
  Knowledge-based Visual Question Answering
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Fengyuan Liu
LRM
125
58
0
22 Dec 2024
Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy
  Generalization with Global and Adaptive Guidance
Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy Generalization with Global and Adaptive Guidance
Zhe Wang
Haozhu Wang
Yanjun Qi
OffRL
83
0
0
01 Dec 2024
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data
  With Soft Alignment
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment
Zijia Song
Z. Zang
Yelin Wang
Guozheng Yang
Jiangbin Zheng
Kaicheng Yu
Wanyu Chen
Stan Z. Li
44
1
0
09 Jun 2024
EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image
  Captioning
EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
Junzhe Zhang
Huixuan Zhang
Xunjian Yin
Xiaojun Wan
23
0
0
29 Feb 2024
TextFusion: Unveiling the Power of Textual Semantics for Controllable
  Image Fusion
TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion
Chunyang Cheng
Tianyang Xu
Xiao-Jun Wu
Hui Li
Xi Li
Zhangyong Tang
Josef Kittler
24
12
0
21 Dec 2023
Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
Xue-mei Hu
Ce Zhang
Yi Zhang
Bowen Hai
Ke Yu
Zhihai He
MDE
VLM
51
17
0
02 Nov 2023
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language
  Reasoning
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning
Yi Zhang
Ce Zhang
Zihan Liao
Yushun Tang
Zhihai He
BDL
VLM
28
10
0
03 Sep 2023
Unsupervised Prototype Adapter for Vision-Language Models
Unsupervised Prototype Adapter for Vision-Language Models
Yi Zhang
Ce Zhang
Xue-mei Hu
Z. He
VLM
34
4
0
22 Aug 2023
Cross-Modal Concept Learning and Inference for Vision-Language Models
Cross-Modal Concept Learning and Inference for Vision-Language Models
Yi Zhang
Ce Zhang
Yushun Tang
Z. He
VLM
MLLM
CLIP
39
15
0
28 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
32
18
0
21 Jul 2023
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language
  Pre-training via Prompting
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting
Zixin Guo
T. Wang
Selen Pehlivan
Abduljalil Radman
Jorma T. Laaksonen
VLM
33
2
0
14 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language
  Pre-training
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
37
25
0
13 Jul 2023
Weakly Supervised Vision-and-Language Pre-training with Relative
  Representations
Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Chi Chen
Peng Li
Maosong Sun
Yang Liu
30
1
0
24 May 2023
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist
  Captions
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions
Sangwoo Mo
Minkyu Kim
Kyungmin Lee
Jinwoo Shin
VLM
CLIP
44
22
0
23 May 2023
Text-based Person Search without Parallel Image-Text Data
Text-based Person Search without Parallel Image-Text Data
Yang Bai
Wenwen Qiang
Min Cao
Cheng Chen
Ziqiang Cao
Liqiang Nie
Min Zhang
42
13
0
22 May 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
24
44
0
31 Mar 2023
ALCAP: Alignment-Augmented Music Captioner
ALCAP: Alignment-Augmented Music Captioner
Zihao He
Weituo Hao
Weiyi Lu
Changyou Chen
Kristina Lerman
Xuchen Song
27
1
0
21 Dec 2022
Training Vision-Language Models with Less Bimodal Supervision
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
21
2
0
01 Nov 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
  Spreading Out Disinformation
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
21
12
0
25 May 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
302
1,086
0
17 Feb 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Word Translation Without Parallel Data
Word Translation Without Parallel Data
Alexis Conneau
Guillaume Lample
MarcÁurelio Ranzato
Ludovic Denoyer
Hervé Jégou
189
1,639
0
11 Oct 2017
1