ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.05175
  4. Cited By
MVP: Multimodality-guided Visual Pre-training

MVP: Multimodality-guided Visual Pre-training

10 March 2022
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
ArXivPDFHTML

Papers citing "MVP: Multimodality-guided Visual Pre-training"

50 / 85 papers shown
Title
GeoMM: On Geodesic Perspective for Multi-modal Learning
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
22
0
0
16 May 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
53
1
0
17 Mar 2025
Dynamic Relation Inference via Verb Embeddings
Dynamic Relation Inference via Verb Embeddings
Omri Suissa
Muhiim Ali
Ariana Azarbal
Hui Shen
Shekhar Pradhan
46
0
0
17 Mar 2025
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked
  Autoencoder Learning
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
Shentong Mo
42
0
0
23 Dec 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image
  Modeling
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
73
0
0
24 Nov 2024
DiffSTR: Controlled Diffusion Models for Scene Text Removal
DiffSTR: Controlled Diffusion Models for Scene Text Removal
Sanhita Pathak
V. Kaushik
Brejesh Lall
DiffM
33
0
0
29 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive
  Self-supervised Learning
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Shentong Mo
Shengbang Tong
40
1
0
25 Oct 2024
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
...
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
Jiankang Deng
MLLM
VLM
45
1
0
18 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
52
2
0
02 Oct 2024
Revisiting Prompt Pretraining of Vision-Language Models
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLM
VPVLM
VLM
43
1
0
10 Sep 2024
UNIC: Universal Classification Models via Multi-teacher Distillation
UNIC: Universal Classification Models via Multi-teacher Distillation
Mert Bulent Sariyildiz
Philippe Weinzaepfel
Thomas Lucas
Diane Larlus
Yannis Kalantidis
37
6
0
09 Aug 2024
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive
  Language-Image Pre-traning Model
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model
Yifan Chen
Xiaozhen Qiao
Zhe Sun
Xuelong Li
VLM
45
3
0
08 Aug 2024
QPT V2: Masked Image Modeling Advances Visual Scoring
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming Sun
Chao Zhou
Jihong Zhu
42
3
0
23 Jul 2024
SemanticMIM: Marring Masked Image Modeling with Semantics Compression
  for General Visual Representation
SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation
Yike Yuan
Huanzhang Dou
Fengjun Guo
Xi Li
36
2
0
15 Jun 2024
OVMR: Open-Vocabulary Recognition with Multi-Modal References
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma
Shiliang Zhang
Longhui Wei
Qi Tian
VLM
44
0
0
07 Jun 2024
Efficient Pretraining Model based on Multi-Scale Local Visual Field
  Feature Reconstruction for PCB CT Image Element Segmentation
Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation
Chen Chen
Kai Qiao
Jie Yang
Jian Chen
Bin Yan
30
1
0
09 May 2024
EVA-X: A Foundation Model for General Chest X-ray Analysis with
  Self-supervised Learning
EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning
Jingfeng Yao
Xinggang Wang
Yuehao Song
Huangxuan Zhao
Jun Ma
Yajie Chen
Wenyu Liu
Bo Wang
ViT
42
5
0
08 May 2024
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
Yuehao Song
Xinggang Wang
Jingfeng Yao
Wenyu Liu
Jinglin Zhang
Xiangmin Xu
ViT
52
2
0
19 Mar 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
42
14
0
31 Dec 2023
Morphing Tokens Draw Strong Masked Image Models
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim
Byeongho Heo
Dongyoon Han
54
3
0
30 Dec 2023
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual
  Test-Time Adaptation
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu
Ran Xu
Senqiao Yang
Renrui Zhang
Qizhe Zhang
Zehui Chen
Yandong Guo
Shanghang Zhang
TTA
35
10
0
19 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
  Generative Artificial Intelligence (AI) Research Landscape
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
94
48
0
18 Dec 2023
Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency
Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency
Yikai Wang
Chenjie Cao
Yanwei Fu
Ke Fan
Xiangyang Xue
Yanwei Fu
DiffM
53
2
0
08 Dec 2023
A brief introduction to a framework named Multilevel
  Guidance-Exploration Network
A brief introduction to a framework named Multilevel Guidance-Exploration Network
Guoqing Yang
Zhiming Luo
Jianzhe Gao
Yingxin Lai
Kun Yang
Yifan He
Shaozi Li
3DH
29
0
0
07 Dec 2023
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Yizhou Wang
Ruiyi Zhang
Haoliang Wang
Uttaran Bhattacharya
Yun Fu
Gang Wu
MLLM
40
10
0
04 Dec 2023
Improve Supervised Representation Learning with Masked Image Modeling
Improve Supervised Representation Learning with Masked Image Modeling
Kaifeng Chen
Daniel M. Salz
Huiwen Chang
Kihyuk Sohn
Dilip Krishnan
Mojtaba Seyedhosseini
SSL
ViT
45
3
0
01 Dec 2023
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
Junkun Yuan
Xinyu Zhang
Hao Zhou
Jian Wang
Zhongwei Qiu
...
Junyu Han
Errui Ding
Lanfen Lin
Fei Wu
Jingdong Wang
38
18
0
31 Oct 2023
Decoupling Common and Unique Representations for Multimodal
  Self-supervised Learning
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
Yi Wang
C. Albrecht
Nassim Ait Ali Braham
Chenying Liu
Zhitong Xiong
Xiaoxiang Zhu
SSL
25
16
0
11 Sep 2023
Empowering Low-Light Image Enhancer through Customized Learnable Priors
Empowering Low-Light Image Enhancer through Customized Learnable Priors
Naishan Zheng
Man Zhou
Yanmeng Dong
Xiangyu Rui
Jie Huang
Chongyi Li
Fengmei Zhao
39
28
0
05 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image
  Modeling
RevColV2: Exploring Disentangled Representations in Masked Image Modeling
Qi Han
Yuxuan Cai
Xiangyu Zhang
41
7
0
02 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
23
27
0
02 Sep 2023
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from
  Stable Diffusion
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion
Zixuan Ni
Longhui Wei
Jiacheng Li
Siliang Tang
Yueting Zhuang
Qi Tian
DiffM
31
21
0
02 Aug 2023
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive
  Representation
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation
Zekun Qi
Muzhou Yu
Runpei Dong
Kaisheng Ma
3DPC
26
11
0
28 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
29
27
0
24 Jul 2023
Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection
Guangzhi Wang
Yangyang Guo
Mohan S. Kankanhalli
28
0
0
19 Jul 2023
MOCA: Self-supervised Representation Learning by Predicting Masked
  Online Codebook Assignments
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
Spyros Gidaris
Andrei Bursuc
Oriane Siméoni
Antonín Vobecký
N. Komodakis
Matthieu Cord
Patrick Pérez
SSL
ViT
24
3
0
18 Jul 2023
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive
  Learners
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners
Bowen Shi
Xiaopeng Zhang
Yaoming Wang
Jin Li
Wenrui Dai
Junni Zou
H. Xiong
Qi Tian
51
4
0
28 Jun 2023
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Lorenzo Baraldi
Roberto Amoroso
Marcella Cornia
Lorenzo Baraldi
Andrea Pilzer
Rita Cucchiara
38
2
0
12 Jun 2023
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with
  Plain Vision Transformers
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
Bowen Zhang
Liyang Liu
Minh Hieu Phan
Zhi Tian
Chunhua Shen
Yifan Liu
ViT
34
28
0
09 Jun 2023
Delving Deeper into Data Scaling in Masked Image Modeling
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng Lu
Xiaojie Jin
Qibin Hou
Jun Hao Liew
Mingg-Ming Cheng
Jiashi Feng
38
4
0
24 May 2023
What Makes for Good Visual Tokenizers for Large Language Models?
What Makes for Good Visual Tokenizers for Large Language Models?
Guangzhi Wang
Yixiao Ge
Xiaohan Ding
Mohan S. Kankanhalli
Ying Shan
MLLM
VLM
33
38
0
20 May 2023
Embrace Limited and Imperfect Training Datasets: Opportunities and
  Challenges in Plant Disease Recognition Using Deep Learning
Embrace Limited and Imperfect Training Datasets: Opportunities and Challenges in Plant Disease Recognition Using Deep Learning
Mingle Xu
H. Kim
Jucheng Yang
A. Fuentes
Yao Meng
Sook Yoon
Taehyun Kim
D. Park
20
18
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
115
0
18 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text
  Retrieval
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
34
2
0
13 May 2023
Continual Vision-Language Representation Learning with Off-Diagonal
  Information
Continual Vision-Language Representation Learning with Off-Diagonal Information
Zixuan Ni
Longhui Wei
Siliang Tang
Yueting Zhuang
Qi Tian
VLM
CLL
33
25
0
11 May 2023
Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders
Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders
Heng Pan
Chenyang Liu
Wenxiao Wang
Liejie Yuan
Hongfa Wang
Zhifeng Li
Wei Liu
VLM
35
3
0
25 Apr 2023
Diffusion Models as Masked Autoencoders
Diffusion Models as Masked Autoencoders
Chen Wei
K. Mangalam
Po-Yao (Bernie) Huang
Yanghao Li
Haoqi Fan
Hu Xu
Huiyu Wang
Cihang Xie
Alan Yuille
Christoph Feichtenhofer
DiffM
SyDa
36
48
0
06 Apr 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
25
20
0
31 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
57
156
0
28 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
40
259
0
20 Mar 2023
12
Next