Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.10401
Cited By
Vision-Language Pre-Training with Triple Contrastive Learning
21 February 2022
Jinyu Yang
Jiali Duan
Son N. Tran
Yi Xu
Sampath Chanda
Liqun Chen
Belinda Zeng
Trishul Chilimbi
Junzhou Huang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vision-Language Pre-Training with Triple Contrastive Learning"
50 / 172 papers shown
Title
Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets
Guillermo Roque
Erika Maquiling
Jose Giovanni Tapia Lopez
Ross Greer
160
0
0
06 May 2025
Symbolic Representation for Any-to-Any Generative Tasks
Jianfei Chen
Xiaoye Zhu
Yanjie Wang
Tianyang Liu
Xinhui Chen
...
Yifei Ke
Jiaheng Liu
Yiwen Yuan
Julian McAuley
Li Li
DiffM
40
0
0
24 Apr 2025
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
Junchen Fu
Yongxin Ni
J. Jose
Ioannis Arapakis
Kaiwen Zheng
Yongbin Li
Xuri Ge
34
0
0
14 Apr 2025
Pose-Aware Weakly-Supervised Action Segmentation
Seth Z. Zhao
Reza Ghoddoosian
Isht Dwivedi
Nakul Agarwal
Behzad Dariush
34
0
0
08 Apr 2025
COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking
Chunhui Zhang
Li Liu
Jialin Gao
Xin Sun
Hao Wen
Xi Zhou
Shiming Ge
Yucheng Wang
42
1
0
02 Apr 2025
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Francesco P. Ramunno
Paolo Massa
Vitaliy Kinakh
Brandon Panos
A. Csillaghy
S. Voloshynovskiy
DiffM
53
0
0
31 Mar 2025
SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting
Chenyu Zhang
Kunlun Xu
Zichen Liu
Yuxin Peng
Jiahuan Zhou
VLM
63
1
0
17 Mar 2025
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou
Mao Ye
Shuaifeng Li
Nianxin Li
Xiatian Zhu
Lei Deng
Hongbin Liu
Zhen Lei
BDL
VLM
TTA
101
0
0
12 Mar 2025
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
A. Dhakal
S. Sastry
Subash Khanal
Adeel Ahmad
Eric Xing
Nathan Jacobs
55
0
0
27 Feb 2025
Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in Pre-trained Vision-Language Models
Jiawei Kong
Hao Fang
Sihang Guo
Chenxi Qing
Bin Chen
Bin Wang
Shu-Tao Xia
AAML
VLM
90
0
0
26 Feb 2025
Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations
Yi Zhang
Chun-Wun Cheng
Junyi He
Zhihai He
Carola-Bibiane Schonlieb
Yuyan Chen
Angelica I Aviles-Rivero
AI4TS
86
0
0
20 Dec 2024
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
Xiaojun Jia
Sensen Gao
Qing Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
46
3
0
04 Nov 2024
Test-time Adaptation for Cross-modal Retrieval with Query Shift
Haobin Li
Peng Hu
Qianjun Zhang
Xi Peng
Xiting Liu
Mouxing Yang
TTA
35
0
0
21 Oct 2024
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping
Taolin Zhang
J. T. Wang
Hang Guo
Tao Dai
Bin Chen
Shu-Tao Xia
VLM
TTA
21
0
0
20 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation
Soojin Jang
Jungmin Yun
Junehyoung Kwon
Eunju Lee
Youngbin Kim
40
3
0
24 Sep 2024
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features
Xin Wei
Yaling Tao
Changde Du
Gangming Zhao
Yizhou Yu
Jinpeng Li
33
0
0
24 Sep 2024
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
Eman Ali
Sathira Silva
Muhammad Haris Khan
VLM
39
0
0
16 Aug 2024
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation
Yubin Cho
Hyunwoo Yu
Suk-Ju Kang
61
18
0
14 Aug 2024
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
Hunmin Yang
Jongoh Jeong
Kuk-Jin Yoon
AAML
VLM
60
4
0
30 Jul 2024
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu
Zhengyao Fang
Pengyuan Lyu
Chengquan Zhang
Fanglin Chen
Guangming Lu
Wenjie Pei
50
2
0
28 Jul 2024
Multi-Modal CLIP-Informed Protein Editing
Mingze Yin
Hanjing Zhou
Yiheng Zhu
Miao Lin
YiXuan Wu
...
Hongxia Xu
Chang-Yu Hsieh
Tingjun Hou
Jintai Chen
Jian Wu
48
7
0
27 Jul 2024
I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction
Zaiqiao Meng
Hao Zhou
Yifang Chen
37
4
0
19 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul Chilimbi
Mubarak Shah
62
2
0
18 Jul 2024
Multimodal Label Relevance Ranking via Reinforcement Learning
Taian Guo
Taolin Zhang
Haoqian Wu
Hanjun Li
Ruizhi Qiao
Xing Sun
OffRL
24
0
0
18 Jul 2024
PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving
Jiyuan Fu
Zhaoyu Chen
Kaixun Jiang
Haijing Guo
Shuyong Gao
Wenqiang Zhang
AAML
45
1
0
18 Jul 2024
Camera-LiDAR Cross-modality Gait Recognition
Wenxuan Guo
Yingping Liang
Zhiyu Pan
Ziheng Xi
Jianjiang Feng
Jie Zhou
CVBM
41
3
0
02 Jul 2024
From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models
Mehar Bhatia
Sahithya Ravi
Aditya Chinchure
EunJeong Hwang
Vered Shwartz
VLM
37
2
0
28 Jun 2024
Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment
Fei Zhou
Zhicong Huang
Tianhao Gu
Guoping Qiu
CoGe
VLM
69
1
0
14 Jun 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Irene Huang
Wei Lin
M. Jehanzeb Mirza
Jacob A. Hansen
Sivan Doveh
...
Trevor Darrel
Chuang Gan
Aude Oliva
Rogerio Feris
Leonid Karlinsky
CoGe
LRM
43
7
0
12 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
Wenliang Zhong
Wenyi Wu
Qi Li
Rob Barton
Boxin Du
Shioulin Sam
Karim Bouyarmane
Ismail B. Tutar
Junzhou Huang
33
3
0
05 Jun 2024
From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals
Ruichu Cai
Zhifan Jiang
Zijian Li
Weilin Chen
Xuexin Chen
Zhifeng Hao
Yifan Shen
Guan-Hong Chen
Kun Zhang
40
1
0
25 May 2024
RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis
Yaxin Liu
Yan Zhou
Ziming Li
Jinchuan Zhang
Yu Shang
Chenyang Zhang
Songlin Hu
21
4
0
20 May 2024
Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation
Manh Luong
Khai Nguyen
Nhat Ho
Reza Haf
D.Q. Phung
Lizhen Qu
30
12
0
16 May 2024
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
Pengfei Zhang
Zi Huang
Guangdong Bai
AAML
39
11
0
09 May 2024
Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning
Zhengyang Liang
Meiyu Liang
Wei Huang
Yawen Li
Zhe Xue
43
1
0
16 Apr 2024
The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Yaohui Li
Qifeng Zhou
Haoxing Chen
Jianbing Zhang
Xinyu Dai
Hao Zhou
VLM
53
0
0
15 Apr 2024
Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning
Yupei Zhang
Li Pan
Qiushi Yang
Tan Li
Zhen Chen
31
1
0
09 Apr 2024
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao
Jianpeng Zhang
Yingda Xia
Tony C. W. Mok
Zi Li
X. Ye
Le Lu
Jian Zheng
Yuxing Tang
Ling Zhang
31
1
0
07 Apr 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao
Ameya Prabhu
Adhiraj Ghosh
Yash Sharma
Philip Torr
Adel Bibi
Samuel Albanie
Matthias Bethge
VLM
128
45
0
04 Apr 2024
TransFusion: Contrastive Learning with Transformers
Huanran Li
Daniel Pimentel-Alarcón
42
0
0
27 Mar 2024
Efficient Test-Time Adaptation of Vision-Language Models
Adilbek Karmanov
Dayan Guan
Shijian Lu
Abdulmotaleb El Saddik
Eric P. Xing
TTA
VLM
19
39
0
27 Mar 2024
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Yabin Zhang
Wen-Qing Zhu
Hui Tang
Zhiyuan Ma
Kaiyang Zhou
Lei Zhang
VLM
31
22
0
26 Mar 2024
DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization
Yunlong Tang
Yuxuan Wan
Lei Qi
Xin Geng
VLM
38
4
0
25 Mar 2024
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao
Xiaojun Jia
Xuhong Ren
Ivor Tsang
Qing Guo
AAML
38
14
0
19 Mar 2024
Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction
Jiyuan Fu
Zhaoyu Chen
Kaixun Jiang
Haijing Guo
Jiafeng Wang
Shuyong Gao
Wenqiang Zhang
VLM
AAML
47
2
0
16 Mar 2024
FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Muhammad Gul Zain Ali Khan
Muhammad Ferjad Naeem
F. Tombari
Luc Van Gool
Didier Stricker
Muhammad Zeshan Afzal
VLM
CLIP
47
3
0
11 Mar 2024
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
Hailang Huang
Zhijie Nie
Ziqiao Wang
Ziyu Shang
37
10
0
08 Mar 2024
Effectiveness Assessment of Recent Large Vision-Language Models
Yao Jiang
Xinyu Yan
Ge-Peng Ji
Keren Fu
Meijun Sun
Huan Xiong
Deng-Ping Fan
Fahad Shahbaz Khan
37
14
0
07 Mar 2024
1
2
3
4
Next