ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.10401
  4. Cited By
Vision-Language Pre-Training with Triple Contrastive Learning

Vision-Language Pre-Training with Triple Contrastive Learning

21 February 2022
Jinyu Yang
Jiali Duan
Son N. Tran
Yi Xu
Sampath Chanda
Liqun Chen
Belinda Zeng
Trishul Chilimbi
Junzhou Huang
    VLM
ArXivPDFHTML

Papers citing "Vision-Language Pre-Training with Triple Contrastive Learning"

50 / 173 papers shown
Title
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal
  Contrastive Training
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training
Chong Liu
Yuqi Zhang
Hongsong Wang
Weihua Chen
F. Wang
Yan Huang
Yixing Shen
Liang Wang
24
25
0
15 Jun 2023
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for
  Histopathology Images
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
Ming Y. Lu
Bowen Chen
Andrew Zhang
Drew F. K. Williamson
Richard J. Chen
Tong Ding
L. Le
Yung-Sung Chuang
Faisal Mahmood
VLM
MedIm
41
100
0
13 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language
  Pre-training
Global and Local Semantic Completion Learning for Vision-Language Pre-training
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
26
2
0
12 Jun 2023
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Paul Pu Liang
Zihao Deng
Martin Q. Ma
James Zou
Louis-Philippe Morency
Ruslan Salakhutdinov
SSL
26
49
0
08 Jun 2023
Interpretable Alzheimer's Disease Classification Via a Contrastive
  Diffusion Autoencoder
Interpretable Alzheimer's Disease Classification Via a Contrastive Diffusion Autoencoder
Ayodeji Ijishakin
A. Abdulaal
Adamos Hadjivasiliou
Sophie Martin
James H. Cole
DiffM
MedIm
38
9
0
05 Jun 2023
Multi-CLIP: Contrastive Vision-Language Pre-training for Question
  Answering tasks in 3D Scenes
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes
Alexandros Delitzas
Maria Parelli
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
17
19
0
04 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and
  Outlook of Recent Work
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
45
0
0
02 Jun 2023
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
  Representations
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations
Pengcheng Jiang
Cao Xiao
Tianfan Fu
Jimeng Sun
50
3
0
02 Jun 2023
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL
  Models
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Roei Herzig
Donghyun Kim
...
Yikang Shen
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
62
53
0
31 May 2023
LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and
  Unlabeled Image Collections
LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections
M. Jehanzeb Mirza
Leonid Karlinsky
Wei Lin
Mateusz Koziñski
Horst Possegger
Rogerio Feris
Horst Bischof
VLM
50
30
0
29 May 2023
Benchmarking Diverse-Modal Entity Linking with Generative Models
Benchmarking Diverse-Modal Entity Linking with Generative Models
Sijia Wang
Alexander Hanbo Li
He Zhu
Shenmin Zhang
Chung-Wei Hang
...
William Wang
Zhiguo Wang
Vittorio Castelli
Bing Xiang
Patrick Ng
VLM
43
8
0
27 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
39
21
0
25 May 2023
RaSa: Relation and Sensitivity Aware Representation Learning for
  Text-based Person Search
RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search
Yang Bai
Ming-Ming Cao
Daming Gao
Ziqiang Cao
Cheng Chen
Zhenfeng Fan
Liqiang Nie
Min Zhang
AI4TS
75
53
0
23 May 2023
Vision-Language Pre-training with Object Contrastive Learning for 3D
  Scene Understanding
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Zhang Tao
Su He
D. Tao
Bin Chen
Zhi Wang
Shutao Xia
VLM
37
22
0
18 May 2023
Cross-Modal Global Interaction and Local Alignment for Audio-Visual
  Speech Recognition
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Chen Chen
Heqing Zou
Qiu-shi Zhu
Eng Siong Chng
34
7
0
16 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
36
22
0
12 May 2023
Adaptive loose optimization for robust question answering
Adaptive loose optimization for robust question answering
Jie Ma
Pinghui Wang
Ze-you Wang
Dechen Kong
Min Hu
Tingxu Han
Jun Liu
OOD
38
4
0
06 May 2023
MoMo: A shared encoder Model for text, image and multi-Modal
  representations
MoMo: A shared encoder Model for text, image and multi-Modal representations
Rakesh Chada
Zhao-Heng Zheng
P. Natarajan
ViT
21
4
0
11 Apr 2023
Detecting and Grounding Multi-Modal Media Manipulation
Detecting and Grounding Multi-Modal Media Manipulation
Rui Shao
Tianxing Wu
Ziwei Liu
44
58
0
05 Apr 2023
Multi-Modal Representation Learning with Text-Driven Soft Masks
Multi-Modal Representation Learning with Text-Driven Soft Masks
Jaeyoo Park
Bohyung Han
SSL
30
4
0
03 Apr 2023
KD-DLGAN: Data Limited Image Generation via Knowledge Distillation
KD-DLGAN: Data Limited Image Generation via Knowledge Distillation
Kaiwen Cui
Yingchen Yu
Fangneng Zhan
Tianran Ouyang
Shijian Lu1
Eric P. Xing
VLM
50
19
0
30 Mar 2023
Revisiting Multimodal Representation in Contrastive Learning: From Patch
  and Token Embeddings to Finite Discrete Tokens
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Yuxiao Chen
Jianbo Yuan
Yu Tian
Shijie Geng
Xinyu Li
Ding Zhou
Dimitris N. Metaxas
Hongxia Yang
14
34
0
27 Mar 2023
Contrastive Alignment of Vision to Language Through Parameter-Efficient
  Transfer Learning
Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning
Zaid Khan
Yun Fu
VLM
41
12
0
21 Mar 2023
MXM-CLR: A Unified Framework for Contrastive Learning of Multifold
  Cross-Modal Representations
MXM-CLR: A Unified Framework for Contrastive Learning of Multifold Cross-Modal Representations
Ye Wang
Bo‐Shu Jiang
C. Zou
Rui Ma
32
5
0
20 Mar 2023
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action
  Recognition with Language Knowledge
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz Koziñski
Yikang Shen
Rogerio Feris
Hilde Kuehne
Horst Bischof
VLM
102
38
0
15 Mar 2023
Understanding and Constructing Latent Modality Structures in Multi-modal
  Representation Learning
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
Qian Jiang
Changyou Chen
Han Zhao
Liqun Chen
Q. Ping
S. D. Tran
Yi Xu
Belinda Zeng
Trishul Chilimbi
51
39
0
10 Mar 2023
Test-Time Distribution Normalization for Contrastively Learned
  Vision-language Models
Test-Time Distribution Normalization for Contrastively Learned Vision-language Models
Yi Zhou
Juntao Ren
Fengyu Li
Ramin Zabih
Ser-Nam Lim
VLM
39
14
0
22 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
48
205
0
20 Feb 2023
Analyzing Multimodal Objectives Through the Lens of Generative Diffusion
  Guidance
Analyzing Multimodal Objectives Through the Lens of Generative Diffusion Guidance
Chaerin Kong
Nojun Kwak
DiffM
28
2
0
10 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech
  Representations with Contextualized Target Representations
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
43
34
0
10 Feb 2023
SimCon Loss with Multiple Views for Text Supervised Semantic
  Segmentation
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Yash J. Patel
Yusheng Xie
Yi Zhu
Srikar Appalaraju
R. Manmatha
37
4
0
07 Feb 2023
Style-Aware Contrastive Learning for Multi-Style Image Captioning
Style-Aware Contrastive Learning for Multi-Style Image Captioning
Yucheng Zhou
Guodong Long
25
22
0
26 Jan 2023
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Haotian Liu
Kilho Son
Jianwei Yang
Ce Liu
Jianfeng Gao
Yong Jae Lee
Chunyuan Li
VLM
40
53
0
17 Jan 2023
Filtering, Distillation, and Hard Negatives for Vision-Language
  Pre-Training
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Filip Radenovic
Abhimanyu Dubey
Abhishek Kadian
Todor Mihaylov
Simon Vandenhende
Yash J. Patel
Y. Wen
Vignesh Ramanathan
D. Mahajan
VLM
40
82
0
05 Jan 2023
Fine-Grained Distillation for Long Document Retrieval
Fine-Grained Distillation for Long Document Retrieval
Yucheng Zhou
Tao Shen
Xiubo Geng
Chongyang Tao
Guodong Long
Can Xu
Daxin Jiang
RALM
32
28
0
20 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
37
125
0
13 Dec 2022
Vision and Structured-Language Pretraining for Cross-Modal Food
  Retrieval
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Mustafa Shukor
Nicolas Thome
Matthieu Cord
CLIP
CoGe
37
8
0
08 Dec 2022
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models
Vishaal Udandarao
Ankush Gupta
Samuel Albanie
VLM
MLLM
29
103
0
28 Nov 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image
  Models
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models
Lei Wang
Jian He
Xingdong Xu
Ning Liu
Hui-juan Liu
41
2
0
27 Nov 2022
Seeing What You Miss: Vision-Language Pre-training with Semantic
  Completion Learning
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Yatai Ji
Rong-Cheng Tu
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
37
13
0
24 Nov 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models
Teaching Structured Vision&Language Concepts to Vision&Language Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Yikang Shen
Roei Herzig
...
Donghyun Kim
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
56
71
0
21 Nov 2022
Unifying Vision-Language Representation Space with Single-tower
  Transformer
Unifying Vision-Language Representation Space with Single-tower Transformer
Jiho Jang
Chaerin Kong
D. Jeon
Seonhoon Kim
Nojun Kwak
27
19
0
21 Nov 2022
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Yunhao Gou
Tom Ko
Hansi Yang
James T. Kwok
Yu Zhang
Mingxuan Wang
VLM
16
10
0
20 Nov 2022
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
James Smith
Paola Cascante-Bonilla
Assaf Arbelle
Donghyun Kim
Yikang Shen
David D. Cox
Diyi Yang
Z. Kira
Rogerio Feris
Leonid Karlinsky
VLM
47
20
0
17 Nov 2022
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Jiaming Chen
Weihua Luo
Ran Song
Xiaolin K. Wei
Lin Ma
Wei Emma Zhang
3DV
40
6
0
22 Oct 2022
RaP: Redundancy-aware Video-language Pre-training for Text-Video
  Retrieval
RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval
Xing Wu
Chaochen Gao
Zijia Lin
Zhongyuan Wang
Jizhong Han
Songlin Hu
32
8
0
13 Oct 2022
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
Yatai Ji
Junjie Wang
Yuan Gong
Lin Zhang
Yan Zhu
Hongfa Wang
Jiaxing Zhang
Tetsuya Sakai
Yujiu Yang
MLLM
27
29
0
11 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language
  Representation Learning
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
21
8
0
09 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature
  Alignment
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
32
21
0
09 Oct 2022
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of
  Vision-Language Models
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Felix Vogel
Nina Shvetsova
Leonid Karlinsky
Hilde Kuehne
VLM
63
7
0
12 Sep 2022
Previous
1234
Next