ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.10401
  4. Cited By
Vision-Language Pre-Training with Triple Contrastive Learning

Vision-Language Pre-Training with Triple Contrastive Learning

21 February 2022
Jinyu Yang
Jiali Duan
Son N. Tran
Yi Xu
Sampath Chanda
Liqun Chen
Belinda Zeng
Trishul Chilimbi
Junzhou Huang
    VLM
ArXivPDFHTML

Papers citing "Vision-Language Pre-Training with Triple Contrastive Learning"

50 / 173 papers shown
Title
Effectiveness Assessment of Recent Large Vision-Language Models
Effectiveness Assessment of Recent Large Vision-Language Models
Yao Jiang
Xinyu Yan
Ge-Peng Ji
Keren Fu
Meijun Sun
Huan Xiong
Deng-Ping Fan
Fahad Shahbaz Khan
44
14
0
07 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
67
12
0
05 Mar 2024
VQAttack: Transferable Adversarial Attacks on Visual Question Answering
  via Pre-trained Models
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Jiaqi Wang
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
OOD
AAML
11
2
0
16 Feb 2024
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong
  Vision-language Adapter
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
Junfei Xiao
Zheng Xu
Alan Yuille
Shen Yan
Boyu Wang
33
3
0
16 Feb 2024
Convincing Rationales for Visual Question Answering Reasoning
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
44
1
0
06 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
56
182
0
24 Jan 2024
Exploring scalable medical image encoders beyond text supervision
Exploring scalable medical image encoders beyond text supervision
Fernando Pérez-García
Harshita Sharma
Sam Bond-Taylor
Kenza Bouzid
Valentina Salvatelli
...
Maria T. A. Wetscherek
Noel C. F. Codella
Stephanie L. Hyland
Javier Alvarez-Valle
Ozan Oktay
LM&MA
MedIm
52
26
0
19 Jan 2024
Improving fine-grained understanding in image-text pre-training
Improving fine-grained understanding in image-text pre-training
Ioana Bica
Anastasija Ilić
Matthias Bauer
Goker Erdogan
Matko Bovsnjak
...
A. Gritsenko
Matthias Minderer
Charles Blundell
Razvan Pascanu
Jovana Mitrović
VLM
30
22
0
18 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video
  Classification
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
41
5
0
08 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for
  Audio-Video Classification
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
40
4
0
08 Jan 2024
P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion
P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion
Linlian Jiang
Pan Chen
Ye Wang
Tieru Wu
Rui Ma
3DPC
40
0
0
29 Dec 2023
3VL: Using Trees to Improve Vision-Language Models' Interpretability
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
51
4
0
28 Dec 2023
MIVC: Multiple Instance Visual Component for Visual-Language Models
MIVC: Multiple Instance Visual Component for Visual-Language Models
Wenyi Wu
Qi Li
Leon Wenliang Zhong
Junzhou Huang
33
3
0
28 Dec 2023
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language
  Pre-training and Open-Vocabulary Object Detection
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen
Tiancheng Zhao
Mingwei Zhu
Jianwei Yin
VLM
ObjD
99
11
0
22 Dec 2023
Text-Guided Face Recognition using Multi-Granularity Cross-Modal
  Contrastive Learning
Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning
Md Golam Moula Mehedi Hasan
S. Sami
Nasser M. Nasrabadi
34
4
0
14 Dec 2023
Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable
  Cyclic Image-Report Generation
Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation
Wenting Chen
Linlin Shen
Jingyang Lin
Jiebo Luo
Xiang Li
Yixuan Yuan
MedIm
26
10
0
13 Dec 2023
BESTMVQA: A Benchmark Evaluation System for Medical Visual Question
  Answering
BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering
Xiaojie Hong
Zixin Song
Liangzhi Li
Xiaoli Wang
Feiyan Liu
28
1
0
13 Dec 2023
Hallucination Augmented Contrastive Learning for Multimodal Large
  Language Model
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang
Haiyang Xu
Mengfan Dong
Jiaxing Chen
Wei Ye
Mingshi Yan
Qinghao Ye
Ji Zhang
Fei Huang
Shikun Zhang
VLM
20
51
0
12 Dec 2023
MAFA: Managing False Negatives for Vision-Language Pre-training
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun
Dohoon Kim
Taesup Moon
VLM
18
4
0
11 Dec 2023
SA-Attack: Improving Adversarial Transferability of Vision-Language
  Pre-training Models via Self-Augmentation
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation
Bangyan He
Xiaojun Jia
Siyuan Liang
Tianrui Lou
Yang Liu
Xiaochun Cao
AAML
VLM
36
23
0
08 Dec 2023
Cross-BERT for Point Cloud Pretraining
Cross-BERT for Point Cloud Pretraining
Xin Li
Peng Li
Zeyong Wei
Zhe Zhu
Mingqiang Wei
Junhui Hou
Liangliang Nan
J. Qin
H. Xie
F. Wang
SSL
3DPC
39
0
0
08 Dec 2023
OT-Attack: Enhancing Adversarial Transferability of Vision-Language
  Models via Optimal Transport Optimization
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization
Dongchen Han
Xiaojun Jia
Yang Bai
Jindong Gu
Yang Liu
Xiaochun Cao
VLM
37
22
0
07 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao
Zhan Tong
K. Lin
Joya Chen
Mike Zheng Shou
41
0
0
04 Dec 2023
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval
Dixuan Lin
Yi-Xing Peng
Jingke Meng
Wei-Shi Zheng
44
5
0
04 Dec 2023
Choosing Wisely and Learning Deeply: Selective Cross-Modality
  Distillation via CLIP for Domain Generalization
Choosing Wisely and Learning Deeply: Selective Cross-Modality Distillation via CLIP for Domain Generalization
Jixuan Leng
Yijiang Li
Haohan Wang
VLM
37
0
0
26 Nov 2023
Domain Aligned CLIP for Few-shot Classification
Domain Aligned CLIP for Few-shot Classification
Muhammad Waleed Gondal
Jochen Gast
Inigo Alonso Ruiz
Richard Droste
Tommaso Macri
Suren Kumar
Luitpold Staudigl
VLM
21
11
0
15 Nov 2023
CROMA: Remote Sensing Representations with Contrastive Radar-Optical
  Masked Autoencoders
CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders
A. Fuller
K. Millard
James R. Green
29
60
0
01 Nov 2023
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP
  Performance on Low-Resource Languages
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
G. O. D. Santos
Diego A. B. Moreira
Alef Iury Ferreira
Jhessica Silva
Luiz Pereira
...
H. Maia
Nádia Da Silva
Esther Colombini
Hélio Pedrini
Sandra Avila
VLM
CLIP
36
4
0
20 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and
  Decoupling
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
29
12
0
08 Oct 2023
Noise-Tolerant Unsupervised Adapter for Vision-Language Models
Noise-Tolerant Unsupervised Adapter for Vision-Language Models
Eman Ali
Dayan Guan
Muhammad Haris Khan
Abdulmotaleb Elsaddik
VLM
24
0
0
26 Sep 2023
Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Rui Shao
Tianxing Wu
Jianlong Wu
Liqiang Nie
Ziwei Liu
24
22
0
25 Sep 2023
A Sentence Speaks a Thousand Images: Domain Generalization through
  Distilling CLIP with Language Guidance
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance
Zeyi Huang
Andy Zhou
Zijian Lin
Mu Cai
Haohan Wang
Yong Jae Lee
VLM
OOD
32
28
0
21 Sep 2023
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight
  Inheritance
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
Kan Wu
Houwen Peng
Zhenghong Zhou
Bin Xiao
Mengchen Liu
...
Xi
Xi Chen
Xinggang Wang
Hongyang Chao
Han Hu
VLM
OODD
29
54
0
21 Sep 2023
TAP: Targeted Prompting for Task Adaptive Generation of Textual Training
  Instances for Visual Classification
TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification
M. Jehanzeb Mirza
Leonid Karlinsky
Wei Lin
Horst Possegger
Rogerio Feris
Horst Bischof
VLM
40
6
0
13 Sep 2023
Decoupling Common and Unique Representations for Multimodal
  Self-supervised Learning
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
Yi Wang
C. Albrecht
Nassim Ait Ali Braham
Chenying Liu
Zhitong Xiong
Xiaoxiang Zhu
SSL
30
16
0
11 Sep 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
  Detection
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
32
5
0
30 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual
  Captioning
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
44
13
0
25 Aug 2023
A Survey of Diffusion Based Image Generation Models: Issues and Their
  Solutions
A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions
Tianyi Zhang
Zheng Wang
Jin Huang
M. M. Tasnim
Wei Shi
VLM
21
21
0
25 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
Learning from Semantic Alignment between Unpaired Multiviews for
  Egocentric Video Recognition
Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition
Qitong Wang
Long Zhao
Liangzhe Yuan
Ting Liu
Xi Peng
36
12
0
22 Aug 2023
Diffusion Models for Image Restoration and Enhancement -- A
  Comprehensive Survey
Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey
Xin Li
Yulin Ren
Xin Jin
Cuiling Lan
Xingyu Wang
Wenjun Zeng
Xinchao Wang
Zhibo Chen
43
86
0
18 Aug 2023
Generating Faithful Text From a Knowledge Graph with Noisy Reference
  Text
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text
Tahsina Hashem
Weiqing Wang
Derry Wijaya
Mohammed Eunus Ali
Yuan-Fang Li
29
3
0
12 Aug 2023
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating
  Vision-Language Models
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
Zheng Ma
Mianzhi Pan
Wenhan Wu
Ka Leong Cheng
Jianbing Zhang
Shujian Huang
Jiajun Chen
VLM
CoGe
31
3
0
06 Aug 2023
Grounded Image Text Matching with Mismatched Relation Reasoning
Grounded Image Text Matching with Mismatched Relation Reasoning
Yu Wu
Yan-Tao Wei
Haozhe Jasper Wang
Yongfei Liu
Sibei Yang
Xuming He
36
6
0
02 Aug 2023
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Aayush Dhakal
Adeel Ahmad
Subash Khanal
Srikumar Sastry
Hannah Kerner
Nathan Jacobs
33
13
0
29 Jul 2023
PromptStyler: Prompt-driven Style Generation for Source-free Domain
  Generalization
PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization
Junhyeong Cho
Gilhyun Nam
Sungyeon Kim
Hunmin Yang
Suha Kwak
VLM
OOD
TTA
27
49
0
27 Jul 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of
  Vision-Language Pre-training Models
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
58
65
0
26 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLM
LRM
26
4
0
15 Jul 2023
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language
  Pre-training via Prompting
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting
Zixin Guo
T. Wang
Selen Pehlivan
Abduljalil Radman
Jorma T. Laaksonen
VLM
33
2
0
14 Jul 2023
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Chunhui Zhang
Xin Sun
Li Liu
Yiqian Yang
Qiong Liu
Xiaoping Zhou
Yanfeng Wang
49
15
0
07 Jul 2023
Previous
1234
Next