ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.03514
  4. Cited By
Recognize Anything: A Strong Image Tagging Model

Recognize Anything: A Strong Image Tagging Model

6 June 2023
Youcai Zhang
Xinyu Huang
Jinyu Ma
Zhaoyang Li
Zhaochuan Luo
Yanchun Xie
Yuzhuo Qin
Tong Luo
Yaqian Li
Siyi Liu
Yandong Guo
Lei Zhang
    VLM
ArXivPDFHTML

Papers citing "Recognize Anything: A Strong Image Tagging Model"

50 / 170 papers shown
Title
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Leveraging Multi-Modal Information to Enhance Dataset Distillation
Zhe Li
Hadrien Reynaud
Bernhard Kainz
DD
50
0
0
13 May 2025
Controllable Image Colorization with Instance-aware Texts and Masks
Controllable Image Colorization with Instance-aware Texts and Masks
Yanru An
Ling Gui
Qiang Hu
Chunlei Cai
Tianxiao Ye
Xiaoyun Zhang
Yanfeng Wang
DiffM
39
0
0
13 May 2025
Split Matching for Inductive Zero-shot Semantic Segmentation
Split Matching for Inductive Zero-shot Semantic Segmentation
Jialei Chen
Xu Zheng
Dongyue Li
Chong Yi
Seigo Ito
D. Paudel
Luc Van Gool
Hiroshi Murase
Daisuke Deguchi
VLM
58
0
0
08 May 2025
TSTMotion: Training-free Scene-aware Text-to-motion Generation
TSTMotion: Training-free Scene-aware Text-to-motion Generation
Ziyan Guo
Haoxuan Qu
Hossein Rahmani
Dewen Soh
Ping Hu
Qiuhong Ke
Jun Liu
VGen
71
1
0
02 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
65
1
0
01 May 2025
GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution
GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution
Aditya Arora
Zhuowen Tu
Yucheng Wang
Ruizheng Bai
Jian Wang
Sizhuo Ma
DiffM
72
0
0
01 May 2025
Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing
Shixuan Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
Jiahui Geng
Gang Yu
Daxin Jiang
DiffM
111
4
0
24 Apr 2025
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
Jinda Lu
Jinghan Li
Yuan Gao
Junkang Wu
Jiancan Wu
Xuben Wang
Xiangnan He
180
0
0
22 Apr 2025
Emergence and Evolution of Interpretable Concepts in Diffusion Models
Emergence and Evolution of Interpretable Concepts in Diffusion Models
Berk Tinaz
Zalan Fabian
Mahdi Soltanolkotabi
DiffM
26
0
0
21 Apr 2025
SG-Reg: Generalizable and Efficient Scene Graph Registration
SG-Reg: Generalizable and Efficient Scene Graph Registration
Chuhao Liu
Zhijian Qiao
Jieqi Shi
Ke Wang
Peize Liu
Shaojie Shen
33
0
0
20 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
46
0
0
19 Apr 2025
Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation
Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation
SoYoung Park
Hyewon Lee
M. Choi
Seunghoon Han
Jong-Ryul Lee
Sungsu Lim
Tae-Ho Kim
VLM
57
0
0
18 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
Xuelong Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
30
2
0
14 Apr 2025
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
Jialu Li
Shoubin Yu
Han Lin
Jaemin Cho
Jaehong Yoon
Joey Tianyi Zhou
DiffM
VGen
55
0
0
11 Apr 2025
F-ViTA: Foundation Model Guided Visible to Thermal Translation
F-ViTA: Foundation Model Guided Visible to Thermal Translation
Jay N. Paranjape
C. D. Melo
Vishal M. Patel
VGen
49
0
0
03 Apr 2025
RBT4DNN: Requirements-based Testing of Neural Networks
RBT4DNN: Requirements-based Testing of Neural Networks
Nusrat Jahan Mozumder
Felipe Toledo
Swaroopa Dola
Matthew B. Dwyer
AAML
49
1
0
03 Apr 2025
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing
Fan Qi
Yu Duan
Changsheng Xu
DiffM
60
0
0
27 Mar 2025
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao
Albert Zhai
Shenlong Wang
VGen
60
1
0
27 Mar 2025
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang
Alexandros Delitzas
Fangjinhua Wang
Ruida Zhang
Xiangyang Ji
Marc Pollefeys
Francis Engelmann
3DV
3DPC
49
4
0
24 Mar 2025
MagicColor: Multi-Instance Sketch Colorization
MagicColor: Multi-Instance Sketch Colorization
Yujie Zhang
Yue Ma
Bingyuan Wang
Qifeng Chen
Zeyu Wang
DiffM
73
0
0
21 Mar 2025
Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing
Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing
Shiyang Zhou
Haijin Zeng
Yunfan Lu
Tong Shao
Ke Tang
Yongyong Chen
Jie Liu
Jingyong Su
Mamba
65
0
0
20 Mar 2025
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space
Weichen Zhan
Zile Zhou
Zhiheng Zheng
Chen Gao
Jinqiang Cui
Yong Li
Xinlei Chen
Xiao-Ping Zhang
LRM
63
1
0
14 Mar 2025
SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
Xiangyu Shi
Zerui Li
Wenqi Lyu
Jiatong Xia
Feras Dayoub
Yanyuan Qiao
Qi Wu
57
1
0
13 Mar 2025
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Yuhan Wang
Cheng Liu
Daou Zhang
Weichao Wu
41
0
0
13 Mar 2025
MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution
X. Li
Jianlong Wu
Xinchuan Huang
C. L. Philip Chen
Weili Guan
Xian-Sheng Hua
Liqiang Nie
DiffM
56
0
0
11 Mar 2025
Collaborative Dynamic 3D Scene Graphs for Open-Vocabulary Urban Scene Understanding
Tim Steinke
Martin Buchner
Niclas Vodisch
Abhinav Valada
60
0
0
11 Mar 2025
Multi-Modal 3D Mesh Reconstruction from Images and Text
Melvin Reka
Tessa Pulli
Markus Vincze
47
0
0
10 Mar 2025
VACE: All-in-One Video Creation and Editing
Zeyinzi Jiang
Zhen Han
Chaojie Mao
J. Zhang
Yulin Pan
Yu Liu
DiffM
VGen
56
6
0
10 Mar 2025
MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification
Xiangyan Qu
Jing Yu
Jiamin Zhuang
Gaopeng Gou
Gang Xiong
Qi Wu
VLM
51
0
0
10 Mar 2025
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
Hao Wang
Zhaoyang Zhang
Xuan Ju
Mingdeng Cao
Liangbin Xie
Ying Shan
Qiang Xu
VGen
DiffM
73
0
0
07 Mar 2025
GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding
Xihan Wang
Dianyi Yang
Yu Gao
Yufeng Yue
Yi Yang
M. Fu
3DGS
54
0
0
06 Mar 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Zhenyu Liu
Yunxin Li
Baotian Hu
Wenhan Luo
Yaowei Wang
Min-Ling Zhang
65
0
0
27 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
126
7
0
25 Feb 2025
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
Luzhou Ge
Xiangyu Zhu
Zhuo Yang
Xuesong Li
3DGS
70
0
0
21 Feb 2025
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
Yun Peng
Xiao Lin
Nachuan Ma
Jiayuan Du
Chuangwei Liu
Chengju Liu
Qi Chen
46
3
0
17 Feb 2025
Consistent Video Colorization via Palette Guidance
Consistent Video Colorization via Palette Guidance
Han Wang
Yuang Zhang
Yuhong Zhang
Lingxiao Lu
Li-Na Song
DiffM
VGen
90
0
0
31 Jan 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
94
0
0
25 Jan 2025
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Hao Li
C. Bezemer
Ahmed E. Hassan
45
2
0
08 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
88
12
0
06 Jan 2025
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song
Weixing Chen
Yong-Jin Liu
Weikai Chen
Guanbin Li
Liang Lin
123
3
0
12 Dec 2024
TASR: Timestep-Aware Diffusion Model for Image Super-Resolution
TASR: Timestep-Aware Diffusion Model for Image Super-Resolution
Qinwei Lin
Xiaopeng Sun
Yu Gao
Yujie Zhong
Dengjie Li
Zheng Zhao
Haoqian Wang
76
0
0
04 Dec 2024
Detailed Object Description with Controllable Dimensions
Detailed Object Description with Controllable Dimensions
Xinran Wang
Hao Zhang
Baoteng Li
Kongming Liang
Hao Sun
Zhongjiang He
Z. Ma
Jun Guo
81
1
0
28 Nov 2024
HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion
  Prior
HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior
Li-Yuan Tsao
Hao-Wei Chen
Hao-Wei Chung
Deqing Sun
Chun-Yi Lee
Kelvin Chan
Ming Yang
DiffM
78
3
0
27 Nov 2024
SentiXRL: An advanced large language Model Framework for Multilingual
  Fine-Grained Emotion Classification in Complex Text Environment
SentiXRL: An advanced large language Model Framework for Multilingual Fine-Grained Emotion Classification in Complex Text Environment
Jie Wang
Yichen Wang
Zhilin Zhang
Jianhao Zeng
Kaidi Wang
Zhiyang Chen
72
0
0
27 Nov 2024
Exploring Aleatoric Uncertainty in Object Detection via Vision
  Foundation Models
Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
Peng Cui
Guande He
Dan Zhang
Zhijie Deng
Yinpeng Dong
Jun Zhu
89
1
0
26 Nov 2024
Fine-Grained Open-Vocabulary Object Recognition via User-Guided
  Segmentation
Fine-Grained Open-Vocabulary Object Recognition via User-Guided Segmentation
Jinwoo Ahn
Hyeokjoon Kwon
Hwiyeon Yoo
ObjD
VLM
77
0
0
23 Nov 2024
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video
  Local Editing
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
Jiahao Hu
Tianxiong Zhong
Xuebo Wang
Boyuan Jiang
Xingye Tian
Fei Yang
Pengfei Wan
Di Zhang
VGen
74
2
0
22 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
50
2
0
05 Nov 2024
Towards Unifying Understanding and Generation in the Era of Vision
  Foundation Models: A Survey from the Autoregression Perspective
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
49
3
0
29 Oct 2024
VLMimic: Vision Language Models are Visual Imitation Learner for
  Fine-grained Actions
VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions
Guanyan Chen
Ming Wang
Te Cui
Yao Mu
Haoyang Lu
...
Mengxiao Hu
Haizhou Li
Y. Li
Yi Yang
Yufeng Yue
VLM
31
3
0
28 Oct 2024
1234
Next