ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.10496
  4. Cited By
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

22 April 2022
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
Bin Xiao
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
    VLM
    OffRL
ArXivPDFHTML

Papers citing "Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks"

20 / 20 papers shown
Title
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
61
0
0
17 Mar 2025
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
Cascade Prompt Learning for Vision-Language Model Adaptation
Cascade Prompt Learning for Vision-Language Model Adaptation
Ge Wu
Xin Zhang
Zheng Li
Zhaowei Chen
Jiajun Liang
Jian Yang
Xiang Li
VLM
32
7
0
26 Sep 2024
LLAVADI: What Matters For Multimodal Large Language Models Distillation
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu
Xiangtai Li
Haobo Yuan
Lu Qi
Yunhai Tong
Ming-Hsuan Yang
36
3
0
28 Jul 2024
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
J. Park
Jack Hessel
Khyathi Raghavi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
26
11
0
08 Dec 2023
Grounding Foundation Models through Federated Transfer Learning: A
  General Framework
Grounding Foundation Models through Federated Transfer Learning: A General Framework
Yan Kang
Tao Fan
Hanlin Gu
Xiaojin Zhang
Lixin Fan
Qiang Yang
AI4CE
68
19
0
29 Nov 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced
  Training
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
31
43
0
28 Nov 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
  Beyond
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
27
3
0
23 Oct 2023
Learning from Rich Semantics and Coarse Locations for Long-tailed Object
  Detection
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Lingchen Meng
Xiyang Dai
Jianwei Yang
Dongdong Chen
Yinpeng Chen
Mengchen Liu
Yi-Ling Chen
Zuxuan Wu
Lu Yuan
Yu-Gang Jiang
16
6
0
18 Oct 2023
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
Moon Ye-Bin
Jisoo Kim
Hong-Kyu Kim
Kilho Son
Tae-Hyun Oh
26
9
0
27 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
26
27
0
24 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
30
3
0
03 Jul 2023
On the Impact of Knowledge Distillation for Model Interpretability
On the Impact of Knowledge Distillation for Model Interpretability
Hyeongrok Han
Siwon Kim
Hyun-Soo Choi
Sungroh Yoon
24
4
0
25 May 2023
Understanding ME? Multimodal Evaluation for Fine-grained Visual
  Commonsense
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Zhecan Wang
Haoxuan You
Yicheng He
Wenhao Li
Kai-Wei Chang
Shih-Fu Chang
23
5
0
10 Nov 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
24
48
0
26 Jul 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
196
405
0
13 Jul 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
301
3,700
0
11 Feb 2021
SEED: Self-supervised Distillation For Visual Representation
SEED: Self-supervised Distillation For Visual Representation
Zhiyuan Fang
Jianfeng Wang
Lijuan Wang
Lei Zhang
Yezhou Yang
Zicheng Liu
SSL
239
190
0
12 Jan 2021
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
1