Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.08721
Cited By
Multimodal Token Fusion for Vision Transformers
19 April 2022
Yikai Wang
Xinghao Chen
Lele Cao
Wen-bing Huang
Gang Hua
Yunhe Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Token Fusion for Vision Transformers"
24 / 24 papers shown
Title
A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
Rui Wang
Shichun Yang
Yuyi Chen
Z. Li
Zexiang Tong
J. Xu
Jiayi Lu
Xinjie Feng
Yaoguang Cao
12
0
0
16 May 2025
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
Xu Zheng
Yuanhuiyi Lyu
Lutao Jiang
Danda Pani Paudel
Luc Van Gool
Xuming Hu
29
0
0
10 May 2025
Position: Foundation Models Need Digital Twin Representations
Yiqing Shen
Hao Ding
Lalithkumar Seenivasan
Tianmin Shu
Mathias Unberath
AI4CE
40
0
0
01 May 2025
HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework
Shuobin Wei
Zhuang Zhou
Zhengan Lu
Zizhao Yuan
Binghua Su
MDE
47
0
0
18 Apr 2025
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
Chenfei Liao
Kaiyu Lei
Xu Zheng
Junha Moon
Zhixiong Wang
Yixuan Wang
Danda Pani Paudel
Luc Van Gool
Xuming Hu
VLM
68
3
0
24 Mar 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
69
2
0
14 Jan 2025
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding
Yi Liu
Chengxin Li
Shoukun Xu
J. Han
ViT
42
2
0
19 Oct 2024
Order-aware Interactive Segmentation
Bin Wang
Anwesa Choudhuri
Meng Zheng
Zhongpai Gao
Benjamin Planche
Andong Deng
Qin Liu
Terrence Chen
Ulas Bagci
Ziyan Wu
VLM
146
1
0
16 Oct 2024
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
ViT
49
9
0
19 May 2024
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Tianlin Li
Yao Rong
Shiao Wang
Yuan Chen
Zhe Wu
Bowei Jiang
Yonghong Tian
Jin Tang
ViT
81
3
0
18 Dec 2023
Uni3DETR: Unified 3D Detection Transformer
Zhenyu Wang
Yali Li
Xi Chen
Hengshuang Zhao
Shengjin Wang
3DPC
42
18
0
09 Oct 2023
ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar
Runwei Guan
Shanliang Yao
Xiaohui Zhu
Ka Lok Man
Yong Yue
Jeremy S. Smith
Eng Gee Lim
Yutao Yue
35
9
0
20 Aug 2023
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
An Object SLAM Framework for Association, Mapping, and High-Level Tasks
Yanmin Wu
Yunzhou Zhang
Delong Zhu
Zhiqiang Deng
Wenkai Sun
Xin Chen
Jian Zhang
21
35
0
12 May 2023
Impact of Pseudo Depth on Open World Object Segmentation with Minimal User Guidance
Robin Schon
K. Ludwig
Rainer Lienhart
VLM
MDE
41
2
0
12 Apr 2023
Full Point Encoding for Local Feature Aggregation in 3D Point Clouds
Yong-xing He
Hongshan Yu
Zhengeng Yang
Xiaoguang Liu
Wei Sun
Ajmal Saeed Mian
ViT
3DPC
14
3
0
08 Mar 2023
Pixel Difference Convolutional Network for RGB-D Semantic Segmentation
Jun Yang
Lizhi Bai
Yaoru Sun
Chunqi Tian
Maoyu Mao
Guorun Wang
SSeg
25
16
0
23 Feb 2023
Emerging Threats in Deep Learning-Based Autonomous Driving: A Comprehensive Survey
Huiyun Cao
Wenlong Zou
Yinkun Wang
Ting Song
Mengjun Liu
AAML
54
4
0
19 Oct 2022
DCANet: Differential Convolution Attention Network for RGB-D Semantic Segmentation
Lizhi Bai
Jun Yang
Chunqi Tian
Yaoru Sun
Maoyu Mao
Yanjun Xu
Weirong Xu
21
9
0
13 Oct 2022
ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions
Anjun Chen
Xiangyu Wang
Kun Shi
Shaohao Zhu
Bin Fang
Yingke Chen
Jiming Chen
Yuchi Huo
Qi Ye
3DH
31
20
0
04 Oct 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
54
527
0
13 Jun 2022
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
Jiaming Zhang
Huayao Liu
Kailun Yang
Xinxin Hu
Ruiping Liu
Rainer Stiefelhagen
ViT
31
299
0
09 Mar 2022
Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction
Yikai Wang
Gang Hua
Wenbing Huang
Fengxiang He
Dacheng Tao
54
29
0
04 Dec 2021
ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
C. Qi
Xinlei Chen
Or Litany
Leonidas J. Guibas
3DPC
195
248
0
29 Jan 2020
1