ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.05778
  4. Cited By
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

10 November 2022
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
Xizhou Zhu
Xiao-hua Hu
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
    VLM
ArXivPDFHTML

Papers citing "InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions"

50 / 321 papers shown
Title
Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic
  Segmentation
Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation
Wooseok Shin
Hyun Joon Park
Jin Sob Kim
Sung Won Han
VLM
48
7
0
31 May 2024
YotoR-You Only Transform One Representation
YotoR-You Only Transform One Representation
José Ignacio Díaz Villa
P. Loncomilla
Javier Ruiz-del-Solar
ViT
46
0
0
30 May 2024
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang
Zongyu Lan
Liujuan Cao
Xianming Lin
Shengchuan Zhang
Guannan Jiang
Rongrong Ji
VLM
34
2
0
29 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
59
4
0
28 May 2024
Color Shift Estimation-and-Correction for Image Enhancement
Color Shift Estimation-and-Correction for Image Enhancement
Yiyu Li
Ke Xu
Gerhard Hancke
Rynson W. H. Lau
49
7
0
28 May 2024
Memorize What Matters: Emergent Scene Decomposition from Multitraverse
Memorize What Matters: Emergent Scene Decomposition from Multitraverse
Yiming Li
Zehong Wang
Yue Wang
Zhiding Yu
Zan Gojcic
Marco Pavone
Chen Feng
Jose M. Alvarez
3DGS
57
1
0
27 May 2024
Building Vision Models upon Heat Conduction
Building Vision Models upon Heat Conduction
Zhaozhi Wang
Yue Liu
Yunfan Liu
Hongtian Yu
Yaowei Wang
QiXiang Ye
ViT
VLM
58
0
0
26 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
49
8
0
25 May 2024
DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable
  Convolutional Transformer
DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer
Wei Dong
Han Zhou
Ruiyi Wang
Xiaohong Liu
Guangtao Zhai
Jun Chen
ViT
61
9
0
24 May 2024
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
Yuheng Shi
Minjing Dong
Chang Xu
Mamba
48
34
0
23 May 2024
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
Chongjie Si
Xuehui Wang
Xue Yang
Zhengqin Xu
Qingyun Li
Jifeng Dai
Yu Qiao
Xiaokang Yang
Wei Shen
33
8
0
23 May 2024
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for
  Vision Transformer
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
51
0
0
22 May 2024
Vision Transformer with Sparse Scan Prior
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
48
5
0
22 May 2024
Influence of Water Droplet Contamination for Transparency Segmentation
Influence of Water Droplet Contamination for Transparency Segmentation
Volker Knauthe
Paul Weitz
Thomas Pollabauer
Tristan Wirth
Arne Rak
Arjan Kuijper
Dieter W. Fellner
51
1
0
21 May 2024
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension
Runwei Guan
Ruixiao Zhang
Ningwei Ouyang
Jianan Liu
Ka Lok Man
...
Ming Xu
Jeremy S. Smith
Eng Gee Lim
Yutao Yue
Hui Xiong
58
9
0
21 May 2024
RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
Xiaosu Zhu
Hualian Sheng
Sijia Cai
Bing Deng
Shaopeng Yang
Qiao Liang
Ken Chen
Lianli Gao
Jingkuan Song
Jieping Ye
48
4
0
16 May 2024
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Oncel Tuzel
VLM
CLIP
38
6
0
14 May 2024
MambaOut: Do We Really Need Mamba for Vision?
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu
Xinchao Wang
Mamba
52
50
0
13 May 2024
LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal
  Emotion Linking as Graph-Based Parsing
LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing
Ana Ezquerro
David Vilares
43
1
0
10 May 2024
ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D
  Occupancy Perception via View-Guided Transformers
ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
Jinke Li
Xiao He
Chonghua Zhou
Xiaoqiang Cheng
Yang Wen
Dan Zhang
ViT
46
11
0
07 May 2024
Spider: A Unified Framework for Context-dependent Concept Segmentation
Spider: A Unified Framework for Context-dependent Concept Segmentation
Xiaoqi Zhao
Youwei Pang
Wei Ji
Baicheng Sheng
Jiaming Zuo
Lihe Zhang
Huchuan Lu
39
6
0
02 May 2024
HIPer: A Human-Inspired Scene Perception Model for Multifunctional
  Mobile Robots
HIPer: A Human-Inspired Scene Perception Model for Multifunctional Mobile Robots
Florenz Graf
Jochen Lindermayr
Birgit Graf
Werner Kraus
Marco F. Huber
46
3
0
27 Apr 2024
The Third Monocular Depth Estimation Challenge
The Third Monocular Depth Estimation Challenge
Jaime Spencer
Fabio Tosi
Matteo Poggi
Ripudaman Singh Arora
Chris Russell
...
Albert Luginov
Muhammad Shahzad
Seyed Hosseini
Aleksander Trajcevski
James H. Elder
MDE
41
7
0
25 Apr 2024
Dual-pronged deep learning preprocessing on heterogeneous platforms with
  CPU, GPU and CSD
Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, GPU and CSD
Jia Wei
Xingjun Zhang
Witold Pedrycz
Longxiang Wang
Jie Zhao
36
0
0
17 Apr 2024
Contextrast: Contextual Contrastive Learning for Semantic Segmentation
Contextrast: Contextual Contrastive Learning for Semantic Segmentation
Chan-Yong Sung
Wanhee Kim
Jungho An
Wooju Lee
Hyungtae Lim
Hyun Myung
52
12
0
16 Apr 2024
Overcoming Scene Context Constraints for Object Detection in wild using
  Defilters
Overcoming Scene Context Constraints for Object Detection in wild using Defilters
Vamshi Krishna Kancharla
Neelam Sinha
40
0
0
12 Apr 2024
Implicit and Explicit Language Guidance for Diffusion-based Visual
  Perception
Implicit and Explicit Language Guidance for Diffusion-based Visual Perception
Hefeng Wang
Jiale Cao
Jin Xie
Aiping Yang
Yanwei Pang
VLM
DiffM
50
2
0
11 Apr 2024
ConsistencyDet: A Few-step Denoising Framework for Object Detection Using the Consistency Model
ConsistencyDet: A Few-step Denoising Framework for Object Detection Using the Consistency Model
Lifan Jiang
Zhihui Wang
Changmiao Wang
Ming Li
Jiaxu Leng
DiffM
33
0
0
11 Apr 2024
Monocular 3D lane detection for Autonomous Driving: Recent Achievements,
  Challenges, and Outlooks
Monocular 3D lane detection for Autonomous Driving: Recent Achievements, Challenges, and Outlooks
Fulong Ma
Weiqing Qi
Guoyang Zhao
Linwei Zheng
Sheng Wang
Yuxuan Liu
Ming Liu
82
9
0
10 Apr 2024
Automatic Defect Detection in Sewer Network Using Deep Learning Based
  Object Detector
Automatic Defect Detection in Sewer Network Using Deep Learning Based Object Detector
Bach Ha
Birgit Schalter
Laura White
J. Köhler
ObjD
AI4CE
32
2
0
09 Apr 2024
3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization
3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization
Seung-bum Chung
Joohyun Park
Hyewon Kan
Hyeongyeop Kang
CLIP
44
1
0
03 Apr 2024
TSNet:A Two-stage Network for Image Dehazing with Multi-scale Fusion and
  Adaptive Learning
TSNet:A Two-stage Network for Image Dehazing with Multi-scale Fusion and Adaptive Learning
Xiaolin Gong
Zehan Zheng
Heyuan Du
35
3
0
03 Apr 2024
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly
  Supervised 3D Object Detection
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
Zihua Liu
Hiroki Sakuma
Masatoshi Okutomi
51
3
0
29 Mar 2024
Benchmarking Object Detectors with COCO: A New Path Forward
Benchmarking Object Detectors with COCO: A New Path Forward
Shweta Singh
Aayan Yadav
Jitesh Jain
Humphrey Shi
Justin Johnson
Karan Desai
36
7
0
27 Mar 2024
Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for
  Intelligent Transportation System
Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System
Jing Li
Lu Bai
Bi-Hong Yang
Chang Li
Lingfei Ma
Lixin Cui
Edwin R. Hancock
56
1
0
24 Mar 2024
Your Image is My Video: Reshaping the Receptive Field via Image-To-Video
  Differentiable AutoAugmentation and Fusion
Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion
S. Casarin
C. Ugwu
Sergio Escalera
Oswald Lanz
36
0
0
22 Mar 2024
WeatherProof: Leveraging Language Guidance for Semantic Segmentation in
  Adverse Weather
WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather
Blake Gella
Howard Zhang
Rishi Upadhyay
Tiffany Chang
Nathan Wei
Matthew Waliman
Yunhao Bao
C. Melo
Alex Wong
A. Kadambi
41
0
0
21 Mar 2024
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ruiping Liu
Philip Torr
Rainer Stiefelhagen
OOD
29
5
0
21 Mar 2024
MTP: Advancing Remote Sensing Foundation Model via Multi-Task
  Pretraining
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang
Jing Zhang
Minqiang Xu
Lin Liu
Dongsheng Wang
...
Chengxi Han
Haonan Guo
Bo Du
Dacheng Tao
Lefei Zhang
50
45
0
20 Mar 2024
GaussNav: Gaussian Splatting for Visual Navigation
GaussNav: Gaussian Splatting for Visual Navigation
Xiaohan Lei
Min Wang
Wen-gang Zhou
Houqiang Li
3DGS
32
13
0
18 Mar 2024
PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks
PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks
Philip Matthias Winter
M. Wimmer
David Major
Dimitrios Lenis
Astrid Berg
Theresa Neubauer
Gaia Romana De Paolis
Johannes Novotny
Sophia Ulonska
Katja Bühler
43
0
0
18 Mar 2024
When Semantic Segmentation Meets Frequency Aliasing
When Semantic Segmentation Meets Frequency Aliasing
Linwei Chen
Lin Gu
Ying Fu
51
5
0
14 Mar 2024
VisionGPT: Vision-Language Understanding Agent Using Generalized
  Multimodal Framework
VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework
Chris Kelly
Luhui Hu
Bang Yang
Yu Tian
Deshun Yang
Cindy Yang
Zaoshan Huang
Zihao Li
Jiayin Hu
Yuexian Zou
47
9
0
14 Mar 2024
MonoOcc: Digging into Monocular Semantic Occupancy Prediction
MonoOcc: Digging into Monocular Semantic Occupancy Prediction
Yupeng Zheng
Xiang Li
Pengfei Li
Yuhang Zheng
Bu Jin
Chengliang Zhong
Xiaoxiao Long
Hao Zhao
Qichao Zhang
31
25
0
13 Mar 2024
Open-World Semantic Segmentation Including Class Similarity
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano
Federico Magistri
Lucas Nunes
Jens Behley
C. Stachniss
VLM
42
8
0
12 Mar 2024
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature
  Interaction for Dense Predictions
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia
Xinliang Wang
Feng Lv
Xin Hao
Yifeng Shi
ViT
34
47
0
12 Mar 2024
Towards In-Vehicle Multi-Task Facial Attribute Recognition:
  Investigating Synthetic Data and Vision Foundation Models
Towards In-Vehicle Multi-Task Facial Attribute Recognition: Investigating Synthetic Data and Vision Foundation Models
Esmaeil Seraj
Walter Talamonti
30
0
0
10 Mar 2024
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Linwei Chen
Lin Gu
Ying Fu
42
24
0
08 Mar 2024
Spatiotemporal Pooling on Appropriate Topological Maps Represented as
  Two-Dimensional Images for EEG Classification
Spatiotemporal Pooling on Appropriate Topological Maps Represented as Two-Dimensional Images for EEG Classification
Takuto Fukushima
Ryusuke Miyamoto
36
1
0
07 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
46
44
0
04 Mar 2024
Previous
1234567
Next