Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.14222
Cited By
Rethinking and Improving Relative Position Encoding for Vision Transformer
29 July 2021
Kan Wu
Houwen Peng
Minghao Chen
Jianlong Fu
Hongyang Chao
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1759★)
Papers citing
"Rethinking and Improving Relative Position Encoding for Vision Transformer"
50 / 168 papers shown
Title
Sampling 3D Molecular Conformers with Diffusion Transformers
J. Frank
Winfried Ripken
Gregor Lied
K. Müller
Oliver T. Unke
Stefan Chmiela
10
0
0
18 Jun 2025
Vision Generalist Model: A Survey
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
65
0
0
11 Jun 2025
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
Hao Yu
Tangyu Jiang
Shuning Jia
Shannan Yan
Shunning Liu
Haolong Qian
Guanghao Li
Shuting Dong
Huaisong Zhang
Chun Yuan
96
0
0
04 Jun 2025
Hierarchical-embedding autoencoder with a predictor (HEAP) as efficient architecture for learning long-term evolution of complex multi-scale physical systems
Alexander Khrabry
Edward Startsev
Andrew Powis
Igor Kaganovich
AI4CE
90
0
0
24 May 2025
Learning to Adapt to Position Bias in Vision Transformer Classifiers
Robert-Jan Bruintjes
Jan van Gemert
162
0
0
19 May 2025
A 2D Semantic-Aware Position Encoding for Vision Transformers
Xi Chen
Shiyang Zhou
Muqi Huang
Jiaxu Feng
Yun Xiong
...
Yize Zhang
Huishuai Bao
Sijia Peng
Chong Li
Feng Shi
ViT
70
0
0
14 May 2025
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Feng Liu
Nicholas Chimitt
Lanqing guo
Jitesh Jain
Aditya Kane
...
Arun Ross
Humphrey Shi
Zhangyang Wang
A. Jain
Xiaoming Liu
CVBM
60
1
0
07 May 2025
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
M. Chowdhury
Md Rifat Ur Rahman
Akil Ahmad Taki
57
0
0
19 Apr 2025
Air Quality Prediction with A Meteorology-Guided Modality-Decoupled Spatio-Temporal Network
Hang Yin
Yan Zhang
Jian Xu
Jian-Long Chang
Yongbin Li
Cheng-Lin Liu
64
0
0
14 Apr 2025
Learning Object Focused Attention
Vivek Trivedy
A. Almalki
Longin Jan Latecki
86
0
0
10 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
279
1
0
03 Apr 2025
Spectral-Adaptive Modulation Networks for Visual Perception
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Paul Hongsuck Seo
Dong Hwan Kim
124
0
0
31 Mar 2025
Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition
Koki Hirooka
Abu Saleh Musa Miah
Tatsuya Murakami
Yuto Akiba
Yong Seok Hwang
Jungpil Shin
SLR
61
0
0
21 Mar 2025
UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security
Binghui Wu
D. Divakaran
M. Gurusamy
93
0
0
06 Mar 2025
Partial Convolution Meets Visual Attention
Haiduo Huang
Fuwei Yang
D. Li
Ji Liu
Lu Tian
Jinzhang Peng
Pengju Ren
E. Barsoum
3DH
443
0
0
05 Mar 2025
Constrained Generative Modeling with Manually Bridged Diffusion Models
Saeid Naderiparizi
Xiaoxuan Liang
Berend Zwartsenberg
Frank Wood
DiffM
102
0
0
27 Feb 2025
Lightweight yet Efficient: An External Attentive Graph Convolutional Network with Positional Prompts for Sequential Recommendation
Jinyu Zhang
Chao Li
Zhongying Zhao
145
1
0
21 Feb 2025
Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers
Mingliang Xu
Yuyao Zhou
Yuxin Zhang
Shen Li
Yong Li
Chia-Wen Lin
Zhanpeng Zeng
Rongrong Ji
MQ
331
0
0
21 Dec 2024
Harmformer: Harmonic Networks Meet Transformers for Continuous Roto-Translation Equivariance
Tomáš Karella
Adam Harmanec
J. Kotera
Jan Blažek
F. Šroubek
72
1
0
06 Nov 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
97
5
0
18 Oct 2024
Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Georgia Channing
Juil Sock
Ronald Clark
Philip Torr
Christian Schroeder de Witt
68
4
0
09 Oct 2024
Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects
Wenhao Li
Yudong Xu
Scott Sanner
Elias Boutros Khalil
ViT
100
5
0
08 Oct 2024
3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model for Laryngeal Cancer Detection Using Laryngoscopic Videos
Meiyu Qiu
Yongqian Li
Wenjun Huang
Haoyun Zhang
Weiping Zheng
Wenbin Lei
Xiaomao Fan
36
0
0
02 Sep 2024
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning
Kunming Su
Qiuxia Wu
Panpan Cai
Xiaogang Zhu
Xuequan Lu
Zhiyong Wang
Kun Hu
3DPC
80
4
0
31 Aug 2024
Hierarchical Network Fusion for Multi-Modal Electron Micrograph Representation Learning with Foundational Large Language Models
Sakhinana Sagar Srinivas
Geethan Sannidhi
Venkataramana Runkana
104
0
0
24 Aug 2024
Positional Prompt Tuning for Efficient 3D Representation Learning
Shaochen Zhang
Zekun Qi
Runpei Dong
Xiuxiu Bai
Xing Wei
101
6
0
21 Aug 2024
MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas
Feng Qiao
Zhexiao Xiong
Xinge Zhu
Yuexin Ma
Qiumeng He
Nathan Jacobs
MDE
66
1
0
03 Aug 2024
Rethinking Attention Module Design for Point Cloud Analysis
Chengzhi Wu
Kaige Wang
Zeyun Zhong
Hao Fu
Junwei Zheng
Jiaming Zhang
Julius Pfrommer
Jürgen Beyerer
3DPC
109
2
0
27 Jul 2024
Transformer-based Single-Cell Language Model: A Survey
Wei Lan
Guohang He
Mingyang Liu
Qingfeng Chen
Junyue Cao
Wei Peng
MedIm
LRM
62
7
0
18 Jul 2024
Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation
Zhibin Lan
Liqiang Niu
Fandong Meng
Jie Zhou
Min Zhang
Jinsong Su
VLM
69
8
0
03 Jul 2024
PNeRV: A Polynomial Neural Representation for Videos
Sonam Gupta
S. Tomar
Grigorios G. Chrysos
Sukhendu Das
A. N. Rajagopalan
61
0
0
27 Jun 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
106
3
0
22 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
104
10
0
22 May 2024
Pseudo Channel: Time Embedding for Motor Imagery Decoding
Zhengqing Miao
Meirong Zhao
77
2
0
21 May 2024
Semantically Consistent Video Inpainting with Conditional Diffusion Models
Dylan Green
William Harvey
Saeid Naderiparizi
Matthew Niedoba
Yunpeng Liu
...
Vasileios Lioutas
Setareh Dabiri
Adam Scibior
Berend Zwartsenberg
Frank Wood
DiffM
104
1
0
30 Apr 2024
Utilizing Large Language Models for Information Extraction from Real Estate Transactions
Yu Zhao
Haoxiang Gao
AILaw
63
10
0
28 Apr 2024
NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT
Xinzhe Zheng
Sijie Ji
Yipeng Pan
Kaiwen Zhang
Chenshu Wu
133
1
0
13 Apr 2024
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc
Nicolas Gonthier
Clement Mallet
Loic Landrieu
122
29
0
12 Apr 2024
HSViT: Horizontally Scalable Vision Transformer
Chenhao Xu
Chang-Tsun Li
Chee Peng Lim
Douglas Creighton
ViT
65
2
0
08 Apr 2024
Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation
Sicong Zang
Zhijun Fang
113
0
0
26 Mar 2024
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim
Yiyang Su
Feng Liu
Anil Jain
Xiaoming Liu
CVBM
88
10
0
21 Mar 2024
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
134
51
0
20 Mar 2024
Quantum Mixed-State Self-Attention Network
Fu Chen
Qinglin Zhao
Li Feng
Chuangtao Chen
Yangbin Lin
Jianhong Lin
93
6
0
05 Mar 2024
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
Wenhao Tang
Fengtao Zhou
Shengyue Huang
Xiang Zhu
Yi Zhang
Bo Liu
137
25
0
27 Feb 2024
Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding
Yu-Qi Yang
Yufeng Guo
Yang Liu
3DPC
99
2
0
22 Feb 2024
Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics
Siqi Miao
Zhiyuan Lu
Mia Liu
Javier Duarte
Pan Li
121
6
0
19 Feb 2024
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey
Haruna Yunusa
Shiyin Qin
Abdulrahman Hamman Adama Chukkol
Abdulganiyu Abdu Yusuf
Isah Bello
A. Lawan
ViT
112
14
0
05 Feb 2024
Towards Visual Syntactical Understanding
Sayeed Shafayet Chowdhury
Soumyadeep Chandra
Kaushik Roy
NAI
155
0
0
30 Jan 2024
MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D Object Detection
Jianan Li
Shaocong Dong
Lihe Ding
Tingfa Xu
3DPC
75
8
0
22 Jan 2024
SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of Lumbar Spine MRI
Jiasong Chen
Linchen Qian
Linhai Ma
Timur Urakov
Weiyong Gu
Liang Liang
MedIm
70
7
0
17 Jan 2024
1
2
3
4
Next