Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.05778
Cited By
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
10 November 2022
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
Xizhou Zhu
Xiao-hua Hu
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions"
50 / 320 papers shown
Title
TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series
Xiaolei Qin
Di Wang
Jingyang Zhang
Fengxiang Wang
Xin Su
Bo Du
Liangpei Zhang
AI4TS
24
0
0
13 May 2025
Griffin: Towards a Graph-Centric Relational Database Foundation Model
Yanbo Wang
Xiyuan Wang
Quan Gan
Minjie Wang
Qibin Yang
David Wipf
Muhan Zhang
141
0
0
08 May 2025
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?
Shashank Agnihotri
David Schader
Nico Sharei
Mehmet Ege Kaçar
M. Keuper
41
2
0
07 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
1
0
17 Apr 2025
RCCFormer: A Robust Crowd Counting Network Based on Transformer
Peng Liu
Heng-Chao Li
Sen Lei
Nanqing Liu
Bin Feng
Xiao Wu
34
0
0
07 Apr 2025
LPA3D: 3D Room-Level Scene Generation from In-the-Wild Images
M. Yang
Yu-Xiao Guo
Yang Liu
Bin Zhou
Xin Tong
3DV
43
0
0
03 Apr 2025
Spectral-Adaptive Modulation Networks for Visual Perception
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Paul Hongsuck Seo
Dong Hwan Kim
42
0
0
31 Mar 2025
Frequency Dynamic Convolution for Dense Image Prediction
Linwei Chen
Lin Gu
Liang Li
C. Yan
Ying Fu
47
0
0
24 Mar 2025
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Qiang Huo
58
0
0
20 Mar 2025
GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation
Zinqin Huang
Gu Wang
Chenyangguang Zhang
Ruida Zhang
Xiu Li
Xiangyang Ji
53
0
0
19 Mar 2025
Panoramic Distortion-Aware Tokenization for Person Detection and Localization Using Transformers in Overhead Fisheye Images
Nobuhiko Wakai
Satoshi Sato
Yasunori Ishii
Takayoshi Yamashita
66
0
0
18 Mar 2025
ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation
Tobias Christian Nauen
Brian B. Moser
Federico Raue
Stanislav Frolov
Andreas Dengel
ViT
60
0
0
12 Mar 2025
Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework
Zhuo Zhi
Chen Feng
Adam Daneshmend
Mine Orlu
Andreas Demosthenous
L. Yin
Da Li
Ziquan Liu
Miguel R. D. Rodrigues
LRM
72
1
0
11 Mar 2025
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow
Evelien Riddell
Yimu Wang
Sean Sedwards
Krzysztof Czarnecki
3DPC
46
0
0
09 Mar 2025
Viewport-Unaware Blind Omnidirectional Image Quality Assessment: A Flexible and Effective Paradigm
Jiebin Yan
Kangcheng Wu
Junjie Chen
Ziwen Tan
Yuming Fang
62
0
0
08 Mar 2025
CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data
Disheng Liu
Yiran Qiao
Wuche Liu
Yiren Lu
Yunlai Zhou
Tuo Liang
Yu Yin
Jing Ma
CML
3DV
61
0
0
06 Mar 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
118
1
0
27 Feb 2025
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Luca Barsellotti
Roberto Bigazzi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
98
1
0
20 Feb 2025
MoFM: A Large-Scale Human Motion Foundation Model
Mohammadreza Baharani
Ghazal Alinezhad Noghre
Armin Danesh Pazho
Gabriel Maldonado
Hamed Tabkhi
AI4CE
200
1
0
08 Feb 2025
3rd Workshop on Maritime Computer Vision (MaCVi) 2025: Challenge Results
Benjamin Kiefer
Lojze Žust
Jon Muhovič
Matej Kristan
J. Pers
...
Ashraf Saleem
Ching-Heng Cheng
Yu-Fan Lin
Tzu-Yu Lin
Chih-Chung Hsu
48
0
0
20 Jan 2025
SVIA: A Street View Image Anonymization Framework for Self-Driving Applications
Dongyu Liu
Xuhong Wang
Cen Chen
Yanhao Wang
Shengyue Yao
Yilun Lin
51
0
0
17 Jan 2025
RadarNeXt: Real-Time and Reliable 3D Object Detector Based On 4D mmWave Imaging Radar
Liye Jia
Runwei Guan
Haocheng Zhao
Qiuchi Zhao
Ka Lok Man
Jeremy S. Smith
Limin Yu
Yutao Yue
47
2
0
04 Jan 2025
Merging Context Clustering with Visual State Space Models for Medical Image Segmentation
Yun Zhu
Dong Zhang
Yi-Mou Lin
Yifei Feng
Jinhui Tang
Mamba
36
1
0
03 Jan 2025
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian
Jing Han
Chengcheng Wang
Yuchen Liang
Chao Xu
Hanting Chen
DiffM
29
2
0
03 Jan 2025
Conformable Convolution for Topologically Aware Learning of Complex Anatomical Structures
Yousef Yeganeh
Rui Xiao
Goktug Guvercin
Nassir Navab
Azade Farshad
MedIm
45
0
0
31 Dec 2024
Open-World Panoptic Segmentation
Matteo Sodano
Federico Magistri
Jens Behley
Cyrill Stachniss
VLM
81
0
0
17 Dec 2024
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu
Meng Lou
Yizhou Yu
115
1
0
16 Dec 2024
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
Yi Feng
Yu Han
Xijing Zhang
Tanghui Li
Yanting Zhang
Rui Fan
117
3
0
15 Dec 2024
Self-Supervised Learning with Probabilistic Density Labeling for Rainfall Probability Estimation
Junha Lee
Sojung An
Sujeong You
Namik Cho
78
0
0
08 Dec 2024
Reflective Teacher: Semi-Supervised Multimodal 3D Object Detection in Bird's-Eye-View via Uncertainty Measure
Saheli Hazra
Sudip Das
Rohit Choudhary
Arindam Das
Ganesh Sistu
Ciarán Eising
Ujjwal Bhattacharya
80
0
0
05 Dec 2024
Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization
Maxime Fontana
Michael W. Spratling
Miaojing Shi
MoE
VLM
71
0
0
04 Dec 2024
FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation
Chang Won Lee
Selina Leveugle
Svetlana Stolpner
Chris Langley
Paul Grouchy
Jonathan Kelly
Steven Waslander
80
0
0
29 Nov 2024
D
2
^2
2
-World: An Efficient World Model through Decoupled Dynamic Flow
Haiming Zhang
Xu Yan
Ying Xue
Zixuan Guo
Shuguang Cui
Zehan Li
Bingbing Liu
71
0
0
26 Nov 2024
Edge Weight Prediction For Category-Agnostic Pose Estimation
Or Hirschorn
S. Avidan
96
0
0
25 Nov 2024
Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training
Man Yao
Xuerui Qiu
Tianxiang Hu
J. Hu
Yuhong Chou
Keyu Tian
Jianxing Liao
Luziwei Leng
Bo Xu
Guoqi Li
76
8
0
25 Nov 2024
Chanel-Orderer: A Channel-Ordering Predictor for Tri-Channel Natural Images
Shen Li
Lei Jiang
Wei Wang
Hongwei Hu
Liang Li
76
0
0
20 Nov 2024
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
50
1
0
12 Nov 2024
COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes
Muhammad Ali
Mamoona Javaid
Mubashir Noman
M. Fiaz
Salman Khan
39
0
0
31 Oct 2024
FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution
Shuai Wang
Zexian Li
Tianhui Song
Xubin Li
Tiezheng Ge
Bo Zheng
Liwen Wang
32
1
0
30 Oct 2024
HRGR: Enhancing Image Manipulation Detection via Hierarchical Region-aware Graph Reasoning
Xudong Wang
Yuan Li
Huiyu Zhou
Jiaran Zhou
Junyu Dong
44
1
0
29 Oct 2024
Scale Propagation Network for Generalizable Depth Completion
Haotian Wang
Meng Yang
Xinhu Zheng
Gang Hua
31
2
0
24 Oct 2024
Data-Efficient CLIP-Powered Dual-Branch Networks for Source-Free Unsupervised Domain Adaptation
Yongguang Li
Yueqi Cao
Jindong Li
Qi Wang
Shengsheng Wang
39
1
0
21 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
52
0
0
14 Oct 2024
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention
Nguyen Huu Bao Long
Chenyu Zhang
Yuzhi Shi
Tsubasa Hirakawa
Takayoshi Yamashita
Tohgoroh Matsui
H. Fujiyoshi
41
2
0
11 Oct 2024
From Logits to Hierarchies: Hierarchical Clustering made Simple
Emanuele Palumbo
Moritz Vandenhirtz
Alain Ryser
Imant Daunhawer
Julia E. Vogt
29
1
0
10 Oct 2024
Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach
Sha Guo
Zhuo Chen
Yang Zhao
Ning Zhang
X. Li
Lingyu Duan
DiffM
59
2
0
08 Oct 2024
Segmenting Wood Rot using Computer Vision Models
Roland Kammerbauer
Thomas H. Schmitt
Tobias Bocklet
27
1
0
30 Sep 2024
Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation
Youngwan Jin
Incheol Park
Hanbin Song
Hyeongjin Ju
Yagiz Nalcakan
Shiho Kim
ViT
35
2
0
25 Sep 2024
The BRAVO Semantic Segmentation Challenge Results in UNCV2024
Tuan-Hung Vu
Eduardo Valle
Andrei Bursuc
Tommie Kerssies
Daan de Geus
...
Michael J. Smith
F. Ferrie
Shamik Basu
Daniel Gehrig
Luc Van Gool
UQCV
VLM
41
3
0
23 Sep 2024
Frequency-Guided Spatial Adaptation for Camouflaged Object Detection
Shizhou Zhang
Dexuan Kong
Yinghui Xing
Yue Lu
Lingyan Ran
Guoqiang Liang
Hexu Wang
Yanning Zhang
38
5
0
19 Sep 2024
1
2
3
4
5
6
7
Next