ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.05778
  4. Cited By
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

10 November 2022
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
Xizhou Zhu
Xiao-hua Hu
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
    VLM
ArXivPDFHTML

Papers citing "InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions"

50 / 322 papers shown
Title
detrex: Benchmarking Detection Transformers
detrex: Benchmarking Detection Transformers
Tianhe Ren
Siyi Liu
Feng Li
Hao Zhang
Ailing Zeng
...
Zhaoyang Zeng
Xianbiao Qi
Yuhui Yuan
Jianwei Yang
Lei Zhang
45
13
0
12 Jun 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
  Framework, and Benchmark
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhen-fei Yin
Jiong Wang
Jianjian Cao
Zhelun Shi
Dingning Liu
...
Lei Bai
Xiaoshui Huang
Zhiyong Wang
Jing Shao
Wanli Ouyang
MLLM
32
153
0
11 Jun 2023
On the Challenges and Perspectives of Foundation Models for Medical
  Image Analysis
On the Challenges and Perspectives of Foundation Models for Medical Image Analysis
Shaoting Zhang
Dimitris N. Metaxas
LM&MA
VLM
MedIm
AI4CE
49
128
0
09 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report
  Generation
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
30
11
0
09 Jun 2023
Towards Label-free Scene Understanding by Vision Foundation Models
Towards Label-free Scene Understanding by Vision Foundation Models
Runnan Chen
You-Chen Liu
Lingdong Kong
Nenglun Chen
Xinge Zhu
Yuexin Ma
Tongliang Liu
Wenping Wang
VLM
35
42
0
06 Jun 2023
Recognize Anything: A Strong Image Tagging Model
Recognize Anything: A Strong Image Tagging Model
Youcai Zhang
Xinyu Huang
Jinyu Ma
Zhaoyang Li
Zhaochuan Luo
...
Tong Luo
Yaqian Li
Siyi Liu
Yandong Guo
Lei Zhang
VLM
47
225
0
06 Jun 2023
Semantic Segmentation on VSPW Dataset through Contrastive Loss and
  Multi-dataset Training Approach
Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach
Min Yan
Qianxiong Ning
Qian Wang
33
1
0
06 Jun 2023
Industrial Anomaly Detection and Localization Using Weakly-Supervised
  Residual Transformers
Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers
Hanxi Li
Jing Wu
Lin Yuanbo Wu
Hao Chen
Deyin Liu
Mingwen Wang
Peng Wang
ViT
45
4
0
06 Jun 2023
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
Junyuan Hong
Yi Zeng
Shuyang Yu
Lingjuan Lyu
R. Jia
Jiayu Zhou
AAML
19
8
0
04 Jun 2023
3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic
  Segmentation on VSPW
3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW
Shijie Chang
Zeqi Hao
Ben Kang
Xiaoqi Zhao
Jiawen Zhu
Zhe Chen
Lihe Zhang
Lu Zhang
Huchuan Lu
21
1
0
04 Jun 2023
Dilated Convolution with Learnable Spacings: beyond bilinear
  interpolation
Dilated Convolution with Learnable Spacings: beyond bilinear interpolation
Ismail Khalfaoui-Hassani
Thomas Pellegrini
T. Masquelier
24
3
0
01 Jun 2023
Contextual Object Detection with Multimodal Large Language Models
Contextual Object Detection with Multimodal Large Language Models
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
43
78
0
29 May 2023
Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation
Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation
J. Sun
Xiaoshuang Shi
Zhiyuan Weng
Kaidi Xu
H. Shen
Xiao-lan Zhu
MLLM
33
2
0
28 May 2023
Image Quality Is Not All You Want: Task-Driven Lens Design for Image
  Classification
Image Quality Is Not All You Want: Task-Driven Lens Design for Image Classification
Xinge Yang
Qiang Fu
Yunfeng Nie
Wolfgang Heidrich
VLM
29
7
0
26 May 2023
UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and
  Interpolation
UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation
Yunfan LU
Guoqiang Liang
Yusheng Wang
Lin Wang
Hui Xiong
24
6
0
24 May 2023
Sparse4D v2: Recurrent Temporal Fusion with Sparse Model
Sparse4D v2: Recurrent Temporal Fusion with Sparse Model
Xuewu Lin
Tianwei Lin
Zi-Hui Pei
Lichao Huang
Zhizhong Su
3DGS
43
56
0
23 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
103
77
0
22 May 2023
Graph Propagation Transformer for Graph Representation Learning
Graph Propagation Transformer for Graph Representation Learning
Zhe Chen
Hao Hao Tan
Tao Wang
Tianrun Shen
Tong Lu
Qiuying Peng
Cheng Cheng
Yue Qi
38
11
0
19 May 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
  with Large Language Model
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Siyuan Huang
Zhengkai Jiang
Hao Dong
Yu Qiao
Peng Gao
Hongsheng Li
LM&Ro
33
93
0
18 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for
  Vision-Centric Tasks
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
38
464
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
53
116
0
18 May 2023
Hausdorff Distance Matching with Adaptive Query Denoising for Rotated
  Detection Transformer
Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer
Hakjin Lee
Minki Song
Jamyoung Koo
Junghoon Seo
42
7
0
12 May 2023
VideoChat: Chat-Centric Video Understanding
VideoChat: Chat-Centric Video Understanding
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
69
534
0
10 May 2023
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT
  Beyond Language
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Zhaoyang Liu
Yinan He
Wenhai Wang
Weiyun Wang
Yi Wang
...
Yali Wang
Limin Wang
Ping Luo
Jifeng Dai
Yu Qiao
LRM
MLLM
47
79
0
09 May 2023
CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device
  Learning
CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning
Sai Qian Zhang
Thierry Tambe
Nestor Cuevas
Gu-Yeon Wei
David Brooks
26
4
0
04 May 2023
SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment
  Anything Model
SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model
Di Wang
Jing Zhang
Bo Du
Minqiang Xu
Lin Liu
Dacheng Tao
Lefei Zhang
128
70
0
03 May 2023
Local and Global Contextual Features Fusion for Pedestrian Intention
  Prediction
Local and Global Contextual Features Fusion for Pedestrian Intention Prediction
Mohsen Azarmi
Mahdi Rezaei
Tanveer Hussain
Chenghao Qian
46
8
0
01 May 2023
UniNeXt: Exploring A Unified Architecture for Vision Recognition
UniNeXt: Exploring A Unified Architecture for Vision Recognition
Fangjian Lin
Jianlong Yuan
Sitong Wu
Fan Wang
Zhibin Wang
ViT
32
14
0
26 Apr 2023
A Strong and Reproducible Object Detector with Only Public Datasets
A Strong and Reproducible Object Detector with Only Public Datasets
Tianhe Ren
Jianwei Yang
Siyi Liu
Ailing Zeng
Feng Li
Hao Zhang
Hongyang Li
Zhaoyang Zeng
Lei Zhang
ObjD
43
11
0
25 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
169
3,070
0
14 Apr 2023
ICDAR 2023 Video Text Reading Competition for Dense and Small Text
ICDAR 2023 Video Text Reading Competition for Dense and Small Text
Weijia Wu
Yuzhong Zhao
Zhuangzi Li
Jiahong Li
Mike Zheng Shou
Umapada Pal
Dimosthenis Karatzas
Xiang Bai
47
6
0
10 Apr 2023
ALIKED: A Lighter Keypoint and Descriptor Extraction Network via
  Deformable Transformation
ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation
Xiaoming Zhao
Xingming Wu
Weihai Chen
Peter C. Y. Chen
Qingsong Xu
Zhengguo Li
33
73
0
07 Apr 2023
RFAConv: Innovating Spatial Attention and Standard Convolutional
  Operation
RFAConv: Innovating Spatial Attention and Standard Convolutional Operation
X. Zhang
Chen Liu
Degang Yang
Tingting Song
Yichen Ye
Ke Li
Ying Song
29
110
0
06 Apr 2023
Temporal Enhanced Training of Multi-view 3D Object Detector via
  Historical Object Prediction
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Zhuofan Zong
Dong Jiang
Guanglu Song
Zeyue Xue
Jingyong Su
Hongsheng Li
Yu Liu
60
35
0
03 Apr 2023
DDP: Diffusion Model for Dense Visual Prediction
DDP: Diffusion Model for Dense Visual Prediction
Yuanfeng Ji
Zhe Chen
Enze Xie
Lanqing Hong
Xihui Liu
Zhaoqiang Liu
Tong Lu
Zhenguo Li
Ping Luo
DiffM
VLM
50
130
0
30 Mar 2023
InceptionNeXt: When Inception Meets ConvNeXt
InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu
Pan Zhou
Shuicheng Yan
Xinchao Wang
48
119
0
29 Mar 2023
Large AI Models in Health Informatics: Applications, Challenges, and the
  Future
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny Lo
AI4MH
LM&MA
42
128
0
21 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
47
263
0
20 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
34
3
0
18 Mar 2023
High-level Feature Guided Decoding for Semantic Segmentation
High-level Feature Guided Decoding for Semantic Segmentation
Ye Huang
Di Kang
Shenghua Gao
Wen Li
Lixin Duan
23
0
0
15 Mar 2023
Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D
  Perception
Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
Chunrui Han
Jinrong Yang
Jian‐Yuan Sun
Zheng Ge
Runpei Dong
Hongyu Zhou
Weixin Mao
Yuang Peng
Xiangyu Zhang
58
58
0
10 Mar 2023
Revisiting Adversarial Training for ImageNet: Architectures, Training
  and Generalization across Threat Models
Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models
Naman D. Singh
Francesco Croce
Matthias Hein
OOD
50
62
0
03 Mar 2023
DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
Shubhankar Borse
Debasmit Das
Hyojin Park
H. Cai
Risheek Garrepalli
Fatih Porikli
48
9
0
02 Mar 2023
Grid-Centric Traffic Scenario Perception for Autonomous Driving: A
  Comprehensive Review
Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review
Yining Shi
Kun Jiang
Jiusi Li
Zelin Qian
Jun Wen
Mengmeng Yang
Ke Wang
Diange Yang
91
25
0
02 Mar 2023
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
Zifu Wang
Xuefei Ning
Matthew B. Blaschko
VLM
39
12
0
11 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image
  and Video
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLM
VLM
MoE
46
161
0
01 Feb 2023
The Power of Linear Combinations: Learning with Random Convolutions
The Power of Linear Combinations: Learning with Random Convolutions
Paul Gavrikov
J. Keuper
37
2
0
26 Jan 2023
Champion Solution for the WSDM2023 Toloka VQA Challenge
Champion Solution for the WSDM2023 Toloka VQA Challenge
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
52
2
0
22 Jan 2023
CARD: Semantic Segmentation with Efficient Class-Aware Regularized
  Decoder
CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder
Ye Huang
Di Kang
Liang Chen
W. Jia
Xiangjian He
Lixin Duan
Xuefei Zhe
Linchao Bao
40
2
0
11 Jan 2023
Designing BERT for Convolutional Networks: Sparse and Hierarchical
  Masked Modeling
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
Keyu Tian
Yi Jiang
Qishuai Diao
Chen Lin
Liwei Wang
Zehuan Yuan
36
101
0
09 Jan 2023
Previous
1234567
Next