Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.09807
Cited By
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
17 November 2022
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information"
39 / 39 papers shown
Title
Multiscale Tensor Summation Factorization as a New Neural Network Layer (MTS Layer) for Multidimensional Data Processing
Mehmet Yamaç
Muhammad Numan Yousaf
S. Kiranyaz
M. Gabbouj
28
1
0
17 Apr 2025
PromptGAR: Flexible Promptive Group Activity Recognition
Zhangyu Jin
Andrew Feng
Ankur Chemburkar
Celso M. De Melo
VLM
42
0
0
11 Mar 2025
Chanel-Orderer: A Channel-Ordering Predictor for Tri-Channel Natural Images
Shen Li
Lei Jiang
Wei Wang
Hongwei Hu
Liang Li
72
0
0
20 Nov 2024
MEANT: Multimodal Encoder for Antecedent Information
Benjamin Iyoya Irving
Annika Marie Schoene
AIFin
29
0
0
10 Nov 2024
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Zhijie Nie
Richong Zhang
Zhangchi Feng
Hailang Huang
Xudong Liu
32
1
0
26 Jun 2024
Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning
Zhuohang Jiang
Bingkui Tong
Xia Du
Ahmed Alhammadi
Jizhe Zhou
42
1
0
18 Jun 2024
Enhancing Domain Adaptation through Prompt Gradient Alignment
Hoang Phan
Lam C. Tran
Quyen Tran
Trung Le
52
0
0
13 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLM
CLIP
41
4
0
11 Jun 2024
Influence of Water Droplet Contamination for Transparency Segmentation
Volker Knauthe
Paul Weitz
Thomas Pollabauer
Tristan Wirth
Arne Rak
Arjan Kuijper
Dieter W. Fellner
38
1
0
21 May 2024
Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
Yujin Han
Difan Zou
AAML
26
3
0
22 Apr 2024
Monocular 3D lane detection for Autonomous Driving: Recent Achievements, Challenges, and Outlooks
Fulong Ma
Weiqing Qi
Guoyang Zhao
Linwei Zheng
Sheng Wang
Yuxuan Liu
Ming-Yu Liu
74
9
0
10 Apr 2024
Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning
Yupei Zhang
Li Pan
Qiushi Yang
Tan Li
Zhen Chen
28
1
0
09 Apr 2024
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong
Shilin Yan
Renrui Zhang
Wanyun Li
Xinyu Zhou
...
Kaixun Jiang
Yiting Chen
Jinglun Li
Zhaoyu Chen
Wenqiang Zhang
VLM
32
38
0
14 Mar 2024
AgentScope: A Flexible yet Robust Multi-Agent Platform
Dawei Gao
Zitao Li
Xuchen Pan
Weirui Kuang
Zhijian Ma
...
Chen Cheng
Hongzhu Shi
Yaliang Li
Bolin Ding
Jingren Zhou
LLMAG
27
26
0
21 Feb 2024
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Xu Yan
Haiming Zhang
Yingjie Cai
Jingming Guo
Weichao Qiu
...
Lihui Jiang
Wei Zhang
Hongbo Zhang
Dengxin Dai
Bingbing Liu
54
17
0
16 Jan 2024
Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification
Jiaqing Zhang
Jie Lei
Weiying Xie
Geng Yang
Daixun Li
Yunsong Li
24
12
0
06 Jan 2024
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection
Joonhyun Jeong
Geondo Park
Jayeon Yoo
Hyungsik Jung
Heesu Kim
VLM
ObjD
35
10
0
12 Dec 2023
AI-SAM: Automatic and Interactive Segment Anything Model
Yimu Pan
Sitao Zhang
Alison D. Gernand
Jeffery A. Goldstein
J. Z. Wang
VLM
32
4
0
05 Dec 2023
DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory
Cenlin Duan
Jianlei Yang
Xiaolin He
Yingjie Qi
Yikun Wang
...
Bonan Yan
Xueyan Wang
Xiaotao Jia
Weitao Pan
Weisheng Zhao
16
5
0
31 Oct 2023
Pre-training with Random Orthogonal Projection Image Modeling
Maryam Haghighat
Peyman Moghadam
Shaheer Mohamed
Piotr Koniusz
VLM
25
8
0
28 Oct 2023
MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network
Soroush Mehraban
Vida Adeli
Babak Taati
ViT
32
41
0
25 Oct 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
21
24
0
04 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
23
27
0
02 Sep 2023
3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW
Shijie Chang
Zeqi Hao
Ben Kang
Xiaoqi Zhao
Jiawen Zhu
Zhe Chen
Lihe Zhang
Lu Zhang
Huchuan Lu
21
1
0
04 Jun 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
33
455
0
18 May 2023
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny P. L. Lo
AI4MH
LM&MA
40
127
0
21 Mar 2023
DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
Shubhankar Borse
Debasmit Das
Hyojin Park
H. Cai
Risheek Garrepalli
Fatih Porikli
40
9
0
02 Mar 2023
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
36
656
0
10 Nov 2022
Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
Qiang Chen
Xiaokang Chen
Jian Wang
Shan Zhang
Kun Yao
Haocheng Feng
Junyu Han
Errui Ding
Gang Zeng
Jingdong Wang
ViT
46
119
0
26 Jul 2022
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
Feng Liang
Yangguang Li
Diana Marculescu
SSL
TPM
ViT
51
22
0
28 May 2022
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Yixuan Wei
Han Hu
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Jianmin Bao
Dong Chen
B. Guo
CLIP
88
124
0
27 May 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
225
898
0
28 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
275
1,081
0
17 Feb 2021
Understanding self-supervised Learning Dynamics without Contrastive Pairs
Yuandong Tian
Xinlei Chen
Surya Ganguli
SSL
138
279
0
12 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,981
0
09 Feb 2021
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,216
0
16 Nov 2016
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,827
0
18 Aug 2016
1