Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.08919
Cited By
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
17 June 2022
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix"
31 / 31 papers shown
Title
Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models
Songlin Dong
Chenhao Ding
Jiangyang Li
Jizhou Han
Qiang Wang
Yuhang He
Yihong Gong
CLL
VLM
40
0
0
12 May 2025
Continual Cross-Modal Generalization
Yan Xia
Hai Huang
Minghui Fang
Zhou Zhao
CLL
54
0
0
01 Apr 2025
Diversity Covariance-Aware Prompt Learning for Vision-Language Models
Songlin Dong
Zhengdong Zhou
Chenhao Ding
Xinyuan Gao
Alex C. Kot
Yihong Gong
VPVLM
VLM
49
0
0
03 Mar 2025
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
Xiaojun Jia
Sensen Gao
Qing-Wu Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
46
3
0
04 Nov 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
25
0
0
02 Oct 2024
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models
Haonan Zheng
Wen Jiang
Xinyang Deng
Wenrui Li
VLM
AAML
26
2
0
06 Aug 2024
Hierarchical Memory for Long Video QA
Yiqin Wang
Haoji Zhang
Yansong Tang
Yong-Jin Liu
Jiashi Feng
Jifeng Dai
Xiaojie Jin
62
2
0
30 Jun 2024
Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation
Lincan Cai
Shuang Li
Wenxuan Ma
Jingxuan Kang
Binhui Xie
Zixun Sun
Chengwei Zhu
MoE
MoMe
42
0
0
13 Jun 2024
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
49
1
0
22 Mar 2024
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao
Xiaojun Jia
Xuhong Ren
Ivor Tsang
Qing-Wu Guo
AAML
38
14
0
19 Mar 2024
Enhancing Multimodal Unified Representations for Cross Modal Generalization
Hai Huang
Yan Xia
Shengpeng Ji
Shulei Wang
Hanting Wang
Minghui Fang
Jieming Zhu
Zhenhua Dong
Sashuai Zhou
Zhou Zhao
31
6
0
08 Mar 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
26
13
0
15 Feb 2024
PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis
Efthymios Georgiou
Yannis Avrithis
Alexandros Potamianos
25
1
0
19 Dec 2023
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models
Yuxuan Lei
Jianxun Lian
Jing Yao
Xu Huang
Defu Lian
Xing Xie
LRM
29
5
0
18 Nov 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Fengxiang Bie
Yibo Yang
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
Shuaiwen Leon Song
EGVM
33
18
0
02 Sep 2023
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Baoshuo Kan
Teng Wang
Wenpeng Lu
Xiantong Zhen
Weili Guan
Feng Zheng
VPVLM
VLM
28
25
0
22 Aug 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
53
65
0
26 Jul 2023
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting
Zixin Guo
T. Wang
Selen Pehlivan
Abduljalil Radman
Jorma T. Laaksonen
VLM
27
2
0
14 Jul 2023
Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Chi Chen
Peng Li
Maosong Sun
Yang Liu
24
1
0
24 May 2023
Text-based Person Search without Parallel Image-Text Data
Yang Bai
Jingyao Wang
Min Cao
Cheng Chen
Ziqiang Cao
Liqiang Nie
Min Zhang
38
13
0
22 May 2023
Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization
Zimeng Qiu
Quanqi Hu
Zhuoning Yuan
Denny Zhou
Lijun Zhang
Tianbao Yang
34
17
0
19 May 2023
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
24
43
0
31 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLM
MLLM
93
9
0
24 Mar 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
23
24
0
09 Mar 2023
CLIP-guided Prototype Modulating for Few-shot Action Recognition
Xiang Wang
Shiwei Zhang
Jun Cen
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
VLM
27
53
0
06 Mar 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
31
202
0
20 Feb 2023
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval
Kuniaki Saito
Kihyuk Sohn
Xiang Zhang
Chun-Liang Li
Chen-Yu Lee
Kate Saenko
Tomas Pfister
30
108
0
06 Feb 2023
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Da Yin
Feng Gao
Govind Thattai
Michael F. Johnston
Kai-Wei Chang
VLM
32
15
0
05 Jan 2023
A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability
Chengtai Cao
Fan Zhou
Yurou Dai
Jianping Wang
Kunpeng Zhang
AAML
24
28
0
21 Dec 2022
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
32
78
0
09 Dec 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
313
3,708
0
11 Feb 2021
1