ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.03514
  4. Cited By
Recognize Anything: A Strong Image Tagging Model

Recognize Anything: A Strong Image Tagging Model

6 June 2023
Youcai Zhang
Xinyu Huang
Jinyu Ma
Zhaoyang Li
Zhaochuan Luo
Yanchun Xie
Yuzhuo Qin
Tong Luo
Yaqian Li
Siyi Liu
Yandong Guo
Lei Zhang
    VLM
ArXivPDFHTML

Papers citing "Recognize Anything: A Strong Image Tagging Model"

50 / 170 papers shown
Title
Efficient 3D Instance Mapping and Localization with Neural Fields
Efficient 3D Instance Mapping and Localization with Neural Fields
George Tang
Krishna Murthy Jatavallabhula
Antonio Torralba
ISeg
39
5
0
28 Mar 2024
Locate, Assign, Refine: Taming Customized Promptable Image Inpainting
Locate, Assign, Refine: Taming Customized Promptable Image Inpainting
Yulin Pan
Chaojie Mao
Zeyinzi Jiang
Zhen Han
Jingfeng Zhang
Xiangteng He
DiffM
44
2
0
28 Mar 2024
GPT-Connect: Interaction between Text-Driven Human Motion Generator and
  3D Scenes in a Training-free Manner
GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner
Haoxuan Qu
Ziyan Guo
Jun Liu
VGen
56
3
0
22 Mar 2024
Renovating Names in Open-Vocabulary Segmentation Benchmarks
Renovating Names in Open-Vocabulary Segmentation Benchmarks
Haiwen Huang
Songyou Peng
Dan Zhang
Andreas Geiger
VLM
37
3
0
14 Mar 2024
OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in
  Large-Scale Outdoor Environments
OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments
Yinan Deng
Jiahui Wang
Jingyu Zhao
Xinyu Tian
Guangyan Chen
Yi Yang
Yufeng Yue
3DV
40
13
0
14 Mar 2024
Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images
Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images
Giuseppe Cartella
Vittorio Cuculo
Marcella Cornia
Rita Cucchiara
DiffM
81
5
0
13 Mar 2024
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model
  Performance and Annotation Cost
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
Oana Ignat
Longju Bai
Joan Nwatu
Rada Mihalcea
39
6
0
12 Mar 2024
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Bingqian Lin
Yunshuang Nie
Ziming Wei
Jiaqi Chen
Shikui Ma
Jianhua Han
Hang Xu
Xiaojun Chang
Xiaodan Liang
LM&Ro
LRM
64
21
0
12 Mar 2024
Reframe Anything: LLM Agent for Open World Video Reframing
Reframe Anything: LLM Agent for Open World Video Reframing
Jiawang Cao
Yongliang Wu
Weiheng Chi
Wenbo Zhu
Ziyue Su
Jay Wu
37
3
0
10 Mar 2024
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
Yunpeng Qu
Kun Yuan
Kai Zhao
Qizhi Xie
Jinhua Hao
Ming Sun
Chao Zhou
27
17
0
08 Mar 2024
FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language
  Foundation Models
FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models
Chuhao Liu
Ke Wang
Jieqi Shi
Zhijian Qiao
Shaojie Shen
VLM
41
5
0
07 Feb 2024
InstanceDiffusion: Instance-level Control for Image Generation
InstanceDiffusion: Instance-level Control for Image Generation
Xudong Wang
Trevor Darrell
Sai Saketh Rambhatla
Rohit Girdhar
Ishan Misra
VLM
DiffM
34
85
0
05 Feb 2024
A Survey on Hallucination in Large Vision-Language Models
A Survey on Hallucination in Large Vision-Language Models
Hanchao Liu
Wenyuan Xue
Yifei Chen
Dapeng Chen
Xiutian Zhao
Ke Wang
Liping Hou
Rong-Zhi Li
Wei Peng
LRM
MLLM
35
115
0
01 Feb 2024
ControlCap: Controllable Region-level Captioning
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao
Yue Liu
Zonghao Guo
Weijia Wu
Chen Gong
Fang Wan
QiXiang Ye
26
5
0
31 Jan 2024
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
Tianhe Ren
Shilong Liu
Ailing Zeng
Jing Lin
Kunchang Li
...
Feng Li
Jie Yang
Hongyang Li
Qing Jiang
Lei Zhang
VLM
51
385
0
25 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
56
182
0
24 Jan 2024
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Ci-Siang Lin
Chien-Yi Wang
Yu-Chiang Frank Wang
Min-Hung Chen
VLM
33
0
0
22 Jan 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
158
719
0
19 Jan 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining
  Question-Answer Prompts for VQA requiring Diverse World Knowledge
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
32
4
0
19 Jan 2024
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi
Peisen Zhao
Zichen Wang
Yuhang Zhang
Yaoming Wang
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Qi Tian
Xiaopeng Zhang
VLM
40
7
0
12 Jan 2024
Incorporating Visual Experts to Resolve the Information Loss in
  Multimodal Large Language Models
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
Xin He
Longhui Wei
Lingxi Xie
Qi Tian
43
8
0
06 Jan 2024
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes
  Interactively
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan
Xiangtai Li
Chong Zhou
Yining Li
Kai Chen
Chen Change Loy
VLM
31
51
0
05 Jan 2024
TagAlign: Improving Vision-Language Alignment with Multi-Tag
  Classification
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Qinying Liu
Wei Wu
Kecheng Zheng
Zhan Tong
Jiawei Liu
Yu Liu
Wei Chen
Zilei Wang
Yujun Shen
VLM
31
6
0
21 Dec 2023
Simple Image-level Classification Improves Open-vocabulary Object
  Detection
Simple Image-level Classification Improves Open-vocabulary Object Detection
Ru Fang
Guansong Pang
Xiaolong Bai
ObjD
VLM
56
14
0
16 Dec 2023
UniTeam: Open Vocabulary Mobile Manipulation Challenge
UniTeam: Open Vocabulary Mobile Manipulation Challenge
Andrew Melnik
Michael Büttner
Leon Harz
Lyon Brown
G. C. Nandi
PS Arjun
Gaurav Kumar Yadav
Rahul Kala
R. Haschke
LM&Ro
38
12
0
14 Dec 2023
TransMed: Large Language Models Enhance Vision Transformer for
  Biomedical Image Classification
TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification
Kaipeng Zheng
Weiran Huang
Lichao Sun
LM&MA
MedIm
VLM
33
0
0
12 Dec 2023
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
  Personalization
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
Xu Peng
Junwei Zhu
Boyuan Jiang
Ying Tai
Donghao Luo
Jiangning Zhang
Wei Lin
Taisong Jin
Chengjie Wang
Rongrong Ji
DiffM
41
55
0
11 Dec 2023
PixLore: A Dataset-driven Approach to Rich Image Captioning
PixLore: A Dataset-driven Approach to Rich Image Captioning
Diego Bonilla
VLM
19
3
0
08 Dec 2023
Lyrics: Boosting Fine-grained Language-Vision Alignment and
  Comprehension via Semantic-aware Visual Objects
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Junyu Lu
Ruyi Gan
Di Zhang
Xiaojun Wu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
MLLM
VLM
31
15
0
08 Dec 2023
Text as Image: Learning Transferable Adapter for Multi-Label
  Classification
Text as Image: Learning Transferable Adapter for Multi-Label Classification
Xueling Zhu
Jiuxin Cao
Jian Liu
Dongqi Tang
Furong Xu
Weijia Liu
Jiawei Ge
Bo Liu
Qingpei Guo
Tianyi Zhang
VLM
41
2
0
07 Dec 2023
PneumoLLM: Harnessing the Power of Large Language Model for
  Pneumoconiosis Diagnosis
PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis
Meiyue Song
Zhihua Yu
Weiwen Zhang
Jiarui Wang
Yuting Lu
...
Nikolaos I. Kanellakis
Jiangfeng Liu
Jing Wang
Binglu Wang
Juntao Yang
LM&MA
30
0
0
06 Dec 2023
Stable Diffusion Exposed: Gender Bias from Prompt to Image
Stable Diffusion Exposed: Gender Bias from Prompt to Image
Yankun Wu
Yuta Nakashima
Noa Garcia
28
16
0
05 Dec 2023
Object Recognition as Next Token Prediction
Object Recognition as Next Token Prediction
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
40
9
0
04 Dec 2023
Universal Segmentation at Arbitrary Granularity with Language
  Instruction
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu
Cairong Zhang
Yitong Wang
Jiahao Wang
Yujiu Yang
Yansong Tang
VLM
VOS
55
15
0
04 Dec 2023
VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video
  Internet of Things
VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video Internet of Things
Yaoyao Zhong
Mengshi Qi
Rui Wang
Yuhan Qiu
Yang Zhang
Huadong Ma
24
2
0
01 Dec 2023
LLM-State: Open World State Representation for Long-horizon Task
  Planning with Large Language Model
LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model
Siwei Chen
Anxing Xiao
David Hsu
LM&Ro
29
6
0
29 Nov 2023
SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
Rongyuan Wu
Tao Yang
Lingchen Sun
Zhengqiang Zhang
Shuai Li
Lei Zhang
DiffM
SupR
46
126
0
27 Nov 2023
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe
Rusiru Thushara
Muhammad Maaz
H. Rasheed
Salman Khan
Mubarak Shah
Fahad Khan
VLM
MLLM
35
34
0
22 Nov 2023
LION : Empowering Multimodal Large Language Model with Dual-Level Visual
  Knowledge
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen
Leyang Shen
Rui Shao
Xiang Deng
Liqiang Nie
VLM
MLLM
73
42
0
20 Nov 2023
Behavior Optimized Image Generation
Behavior Optimized Image Generation
Varun Khurana
Yaman Kumar Singla
J. Subramanian
R. Shah
Changyou Chen
Zhiqiang Xu
Balaji Krishnamurthy
EGVM
16
4
0
18 Nov 2023
What Do I Hear? Generating Sounds for Visuals with ChatGPT
What Do I Hear? Generating Sounds for Visuals with ChatGPT
David Chuan-En Lin
Nikolas Martelaro
21
0
0
09 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
56
106
0
09 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
34
13
0
09 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
GLaMM: Pixel Grounding Large Multimodal Model
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLM
VLM
47
207
0
06 Nov 2023
A Multi-Modal Foundation Model to Assist People with Blindness and Low
  Vision in Environmental Interaction
A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction
Yu Hao
Fan Yang
Hao Huang
Shuaihang Yuan
Sundeep Rangan
John-Ross Rizzo
Yao Wang
Yi Fang
29
7
0
31 Oct 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
34
2
0
28 Oct 2023
Open-Set Image Tagging with Multi-Grained Text Supervision
Open-Set Image Tagging with Multi-Grained Text Supervision
Xinyu Huang
Yi-Jie Huang
Youcai Zhang
Weiwei Tian
Rui Feng
Yuejie Zhang
Yanchun Xie
Yaqian Li
Lei Zhang
VLM
33
28
0
23 Oct 2023
Weakly-Supervised Semantic Segmentation with Image-Level Labels: from
  Traditional Models to Foundation Models
Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models
Zhaozheng Chen
Qianru Sun
VLM
32
7
0
19 Oct 2023
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task
  Instruction Tuning
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Junyu Lu
Di Zhang
Xiaojun Wu
Xinyu Gao
Ruyi Gan
Jiaxing Zhang
Yan Song
Pingjian Zhang
VLM
MLLM
22
7
0
12 Oct 2023
CLIP Is Also a Good Teacher: A New Learning Framework for Inductive
  Zero-shot Semantic Segmentation
CLIP Is Also a Good Teacher: A New Learning Framework for Inductive Zero-shot Semantic Segmentation
Jialei Chen
Daisuke Deguchi
Chenkai Zhang
Xu Zheng
Hiroshi Murase
VLM
19
9
0
03 Oct 2023
Previous
1234
Next