ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.00808
  4. Cited By
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

2 January 2023
Sanghyun Woo
Shoubhik Debnath
Ronghang Hu
Xinlei Chen
Zhuang Liu
In So Kweon
Saining Xie
    SyDa
ArXivPDFHTML

Papers citing "ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders"

50 / 328 papers shown
Title
Exploring The Visual Feature Space for Multimodal Neural Decoding
Exploring The Visual Feature Space for Multimodal Neural Decoding
Weihao Xia
Cengiz Öztireli
9
0
0
21 May 2025
UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset
UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset
Hua Li
Shijie Lian
Zhiyuan Li
Runmin Cong
Sam Kwong
VLM
12
0
0
21 May 2025
Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs
Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs
Abhishek Dey
Aabha Bothera
Samhita Sarikonda
Rishav Aryan
Sanjay Kumar Podishetty
Akshay Havalgi
Gaurav Singh
Saurabh Srivastava
12
0
0
16 May 2025
A Simple Detector with Frame Dynamics is a Strong Tracker
A Simple Detector with Frame Dynamics is a Strong Tracker
Chenxu Peng
Changbo Wang
Minrui Zou
Danyang Li
Zheng Yang
Yimian Dai
Ming-Ming Cheng
Xiang Li
62
0
0
08 May 2025
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
Sainath Dey
Mitul Goswami
Jashika Sethi
Prasant Kumar Pattnaik
ViT
33
0
0
07 May 2025
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
Qingyuan Wang
Guoxin Wang
B. Cardiff
Deepu John
38
0
0
07 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I Roumeliotis
Manoj Karkee
LM&Ro
203
1
0
07 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
49
0
0
29 Apr 2025
A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images
A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images
Xin Li
Wenhui Zhu
Peijie Qiu
Oana Dumitrascu
Amal Youssef
Yibo Wang
SSL
MedIm
92
0
0
25 Apr 2025
MSAD-Net: Multiscale and Spatial Attention-based Dense Network for Lung Cancer Classification
MSAD-Net: Multiscale and Spatial Attention-based Dense Network for Lung Cancer Classification
Santanu Roy
Shweta Singh
Palak Sahu
Ashvath Suresh
Debashish Das
32
0
0
20 Apr 2025
DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection
DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection
H. Chen
Xin Xu
Fangling Pu
35
0
0
18 Apr 2025
Learning from Noisy Pseudo-labels for All-Weather Land Cover Mapping
Learning from Noisy Pseudo-labels for All-Weather Land Cover Mapping
Wang Liu
Zhiyu Wang
Xin Guo
Puhong Duan
Xudong Kang
Shutao Li
24
0
0
18 Apr 2025
Real-World Depth Recovery via Structure Uncertainty Modeling and Inaccurate GT Depth Fitting
Real-World Depth Recovery via Structure Uncertainty Modeling and Inaccurate GT Depth Fitting
Delong Suzhang
Meng Yang
32
0
0
16 Apr 2025
CoMotion: Concurrent Multi-person 3D Motion
CoMotion: Concurrent Multi-person 3D Motion
Alejandro Newell
Peiyun Hu
Lahav Lipson
Stephan R. Richter
V. Koltun
3DH
VOT
74
0
0
16 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
Shixuan Liu
Jiyang Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
30
0
0
14 Apr 2025
Novel Pooling-based VGG-Lite for Pneumonia and Covid-19 Detection from Imbalanced Chest X-Ray Datasets
Novel Pooling-based VGG-Lite for Pneumonia and Covid-19 Detection from Imbalanced Chest X-Ray Datasets
Santanu Roy
Ashvath Suresh
Palak Sahu
Tulika Rudra Gupta
32
0
0
10 Apr 2025
Attributes-aware Visual Emotion Representation Learning
Attributes-aware Visual Emotion Representation Learning
R. S. Maharjan
Marta Romeo
Angelo Cangelosi
30
0
0
09 Apr 2025
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
Jiahang Li
Shibo Xue
Yong Su
30
0
0
08 Apr 2025
Reinforced Multi-teacher Knowledge Distillation for Efficient General Image Forgery Detection and Localization
Reinforced Multi-teacher Knowledge Distillation for Efficient General Image Forgery Detection and Localization
Zeqin Yu
Jiangqun Ni
Jian Zhang
Haoyi Deng
Yuzhen Lin
31
0
0
07 Apr 2025
OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
Zhongjian Wang
Peng Zhang
Jinwei Qi
Guangyuan Wang Sheng Xu
Bang Zhang
Liefeng Bo
DiffM
VGen
40
0
0
03 Apr 2025
APSeg: Auto-Prompt Model with Acquired and Injected Knowledge for Nuclear Instance Segmentation and Classification
APSeg: Auto-Prompt Model with Acquired and Injected Knowledge for Nuclear Instance Segmentation and Classification
Liying Xu
Hongliang He
Wei Han
Hanbin Huang
Siwei Feng
Guohong Fu
VLM
67
0
0
03 Apr 2025
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Chongjie Si
Zhiyi Shi
Xuehui Wang
Yichen Xiao
Xiaokang Yang
Wei-Ming Shen
AI4CE
70
0
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
75
2
0
01 Apr 2025
Self-Supervised Pretraining for Aerial Road Extraction
Self-Supervised Pretraining for Aerial Road Extraction
Rupert Polley
Sai Vignesh Abishek Deenadayalan
Johann Marius Zöllner
SSL
74
0
0
31 Mar 2025
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
VGen
75
1
0
31 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Yiming Lei
Chenkai Zhang
Zeming Liu
Qingjie Liu
Yansen Wang
54
0
0
28 Mar 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Haomin Zhang
Chang Liu
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
88
0
0
28 Mar 2025
An improved EfficientNetV2 for garbage classification
An improved EfficientNetV2 for garbage classification
Wenxuan Qiu
Chengxin Xie
Jingui Huang
53
0
0
27 Mar 2025
A Spatial-temporal Deep Probabilistic Diffusion Model for Reliable Hail Nowcasting with Radar Echo Extrapolation
A Spatial-temporal Deep Probabilistic Diffusion Model for Reliable Hail Nowcasting with Radar Echo Extrapolation
Haonan Shi
Long Tian
Jie Tao
Yufei Li
Liming Wang
Xiyang Liu
AI4Cl
35
0
0
26 Mar 2025
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
Tao Feng
Zhiyuan Zhao
Yifan Xie
Yuqi Ye
Xiangyang Luo
Xun Guan
Yongqian Li
57
0
0
21 Mar 2025
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Robin Hesse
Doğukan Bağcı
Bernt Schiele
Simone Schaub-Meyer
Stefan Roth
VLM
62
0
0
21 Mar 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
58
1
0
21 Mar 2025
Depth-Aware Range Image-Based Model for Point Cloud Segmentation
Depth-Aware Range Image-Based Model for Point Cloud Segmentation
Bike Chen
Antti Tikänmaki
Juha Roning
3DPC
3DV
57
0
0
19 Mar 2025
Fibonacci-Net: A Lightweight CNN model for Automatic Brain Tumor Classification
Fibonacci-Net: A Lightweight CNN model for Automatic Brain Tumor Classification
Santanu Roy
Ashvath Suresh
Archit Gupta
Shubhi Tiwari
Palak Sahu
Prashant Adhikari
Yuvraj S. Shekhawat
53
0
0
18 Mar 2025
8-Calves Image dataset
8-Calves Image dataset
Xuyang Fang
S. Hannuna
Neill D. F. Campbell
192
0
0
17 Mar 2025
AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
Hadam Baek
Hannie Shin
Jiyoung Seo
Chanwoo Kim
Saerom Kim
Hyeongbok Kim
Sangpil Kim
46
0
0
17 Mar 2025
Solution for 8th Competition on Affective & Behavior Analysis in-the-wild
Jun-chen Yu
Yunxiang Zhang
Xilong Lu
Yang Zheng
Yongqi Wang
Lingsi Zhu
49
0
0
14 Mar 2025
Unlocking Open-Set Language Accessibility in Vision Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
55
0
0
14 Mar 2025
CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition
Kaixiang Yang
Xin Li
Qiang Li
Zhiwei Wang
48
0
0
13 Mar 2025
Context-guided Responsible Data Augmentation with Diffusion Models
Khawar Islam
Naveed Akhtar
59
1
0
12 Mar 2025
A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization
Md Yousuf Harun
Christopher Kanan
AI4CE
55
0
0
09 Mar 2025
Segment Anything, Even Occluded
Wei-En Tai
Yu-Lin Shih
Cheng Sun
Y. Wang
Hwann-Tzong Chen
VLM
69
0
0
08 Mar 2025
Partial Convolution Meets Visual Attention
Haiduo Huang
Fuwei Yang
D. Li
Ji Liu
Lu Tian
Jinzhang Peng
Pengju Ren
E. Barsoum
3DH
243
0
0
05 Mar 2025
Is Pre-training Applicable to the Decoder for Dense Prediction?
Is Pre-training Applicable to the Decoder for Dense Prediction?
Chao Ning
Wanshui Gan
Weihao Xuan
Naoto Yokoya
48
0
0
05 Mar 2025
Automatic Drywall Analysis for Progress Tracking and Quality Control in Construction
Mariusz Trzeciakiewicz
Aleixo Cambeiro Barreiro
Niklas Gard
Anna Hilsmann
Peter Eisert
55
0
0
05 Mar 2025
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu
Songlin Du
Mamba
75
0
0
05 Mar 2025
Unsupervised Waste Classification By Dual-Encoder Contrastive Learning and Multi-Clustering Voting (DECMCV)
Kui Huang
Mengke Song
Shuo Ba
Ling An
Huajie Liang
Huanxi Deng
Yang Liu
Zhenyu Zhang
Chichun Zhou
55
0
0
04 Mar 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
86
0
0
25 Feb 2025
MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images
MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images
Mustafa Yurdakul
Kubra Uyar
Şakir Tasdemir
63
1
0
24 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
76
0
0
24 Feb 2025
1234567
Next