Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.03514
Cited By
Recognize Anything: A Strong Image Tagging Model
6 June 2023
Youcai Zhang
Xinyu Huang
Jinyu Ma
Zhaoyang Li
Zhaochuan Luo
Yanchun Xie
Yuzhuo Qin
Tong Luo
Yaqian Li
Siyi Liu
Yandong Guo
Lei Zhang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Recognize Anything: A Strong Image Tagging Model"
50 / 170 papers shown
Title
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLM
VLM
50
3
0
23 Oct 2024
Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
Zhijie Yan
Shufei Li
Zhendong Wang
Lixiu Wu
Han Wang
Jun Zhu
Lijiang Chen
Jihong Liu
43
1
0
15 Oct 2024
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
Zhen Han
Zeyinzi Jiang
Yulin Pan
Jingfeng Zhang
Chaojie Mao
Chenwei Xie
Yu Liu
Jingren Zhou
DiffM
35
17
0
30 Sep 2024
World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering
Jiacong Wang
Bohong Wu
Haiyong Jiang
Xun Zhou
Xin Xiao
Haoyuan Guo
Jun Xiao
VLM
VGen
46
4
0
30 Sep 2024
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Yanyuan Qiao
Wenqi Lyu
Hui Wang
Zixu Wang
Zerui Li
Yuan Zhang
Mingkui Tan
Qi Wu
LRM
38
4
0
27 Sep 2024
Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement
Guanlin Li
Ke Zhang
Ting Wang
Ming Li
Bin Zhao
Xuelong Li
19
0
0
25 Sep 2024
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models
Mike Zhang
Kaixian Qu
Vaishakh Patil
Cesar Cadena
Marco Hutter
LM&Ro
3DV
41
4
0
23 Sep 2024
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization
Minyi Zhao
Jie Wang
Zerui Li
Jiyuan Zhang
Zhenbang Sun
Shuigeng Zhou
MLLM
VLM
47
0
0
22 Sep 2024
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Aneesh Chavan
Vaibhav Agrawal
Vineeth Bhat
Sarthak Chittawar
Siddharth Srivastava
Chetan Arora
K. M. Krishna
95
0
0
18 Sep 2024
MemoVis: A GenAI-Powered Tool for Creating Companion Reference Images for 3D Design Feedback
Chen Chen
Cuong Nguyen
Thibault Groueix
Vladimir G. Kim
Nadir Weibel
DiffM
34
3
0
09 Sep 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Jiaxin Cheng
Zixu Zhao
Tong He
Tianjun Xiao
Yicong Zhou
Zheng Zhang
DiffM
52
0
0
07 Sep 2024
LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
Jingyi Wang
Jianzhong Ju
Jian Luan
Zhidong Deng
VLM
38
1
0
29 Aug 2024
Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail
Bianca Lamm
Janis Keuper
54
2
0
28 Aug 2024
Segment Any Mesh
George Tang
William Zhao
Logan Ford
David Benhaim
Paul Zhang
38
8
0
24 Aug 2024
Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation
Md Touhidul Islam
Imran Kabir
Elena Ariel Pearce
Md. Alimoor Reza
Syed Masum Billah
24
2
0
23 Aug 2024
Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant
Guofeng Mei
Luigi Riz
Yiming Wang
Fabio Poiesi
ISeg
VLM
69
3
0
20 Aug 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
44
91
0
16 Aug 2024
Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs
Jinming Liu
Yuntao Wei
Junyan Lin
Shengyang Zhao
Heming Sun
Zhibo Chen
Wenjun Zeng
Xin Jin
54
2
0
16 Aug 2024
SceneGPT: A Language Model for 3D Scene Understanding
Shivam Chandhok
LRM
39
4
0
13 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
36
10
0
08 Aug 2024
OpenSU3D: Open World 3D Scene Understanding using Foundation Models
Rafay Mohiuddin
Sai Manoj Prakhya
Fiona Collins
Ziyuan Liu
André Borrmann
43
2
0
19 Jul 2024
Multimodal Label Relevance Ranking via Reinforcement Learning
Taian Guo
Taolin Zhang
Haoqian Wu
Hanjun Li
Ruizhi Qiao
Xing Sun
OffRL
24
0
0
18 Jul 2024
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer
Kyra Ahrens
Stefan Wermter
CLIP
VLM
40
3
0
15 Jul 2024
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
Yuchen Yang
Kwonjoon Lee
Behzad Dariush
Yinzhi Cao
Shao-Yuan Lo
LRM
44
12
0
14 Jul 2024
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li
Zhengqiang Zhang
Chenhang He
Zhiyuan Ma
Vishal M. Patel
Lei Zhang
3DV
VLM
47
6
0
13 Jul 2024
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Xiaotong Li
Fan Zhang
Haiwen Diao
Yueze Wang
Xinlong Wang
Ling-yu Duan
VLM
31
27
0
11 Jul 2024
LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction
Kanghao Chen
Hangyu Li
Jiazhou Zhou
Zeyu Wang
Lin Wang
DiffM
VGen
46
2
0
08 Jul 2024
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Haozhe Zhao
Xiaojian Ma
Liang Chen
Shuzheng Si
Rujie Wu
Kaikai An
Peiyu Yu
Minjia Zhang
Qing Li
Baobao Chang
42
44
0
07 Jul 2024
Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models
A. Mukherjee
Ziwei Zhu
Antonios Anastasopoulos
49
2
0
02 Jul 2024
Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach
Yuxiang Huang
Yuhao Chen
John S. Zelek
MDE
47
2
0
27 Jun 2024
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps
Dicong Qiu
Wenzong Ma
Zhenfu Pan
Hui Xiong
Junwei Liang
LM&Ro
39
7
0
26 Jun 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Xiangyu Zhao
Xiangtai Li
Haodong Duan
Haian Huang
Yining Li
Kai Chen
Hua Yang
VLM
MLLM
47
10
0
25 Jun 2024
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution
Aiwen Jiang
Zhi Wei
Long Peng
Feiqiang Liu
Wenbo Li
Mingwen Wang
DiffM
54
2
0
24 Jun 2024
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Yunxin Li
Xinyu Chen
Baotian Hu
Longyue Wang
Haoyuan Shi
Min-Ling Zhang
MLLM
LRM
56
26
0
17 Jun 2024
CUPID: Contextual Understanding of Prompt-conditioned Image Distributions
Yayan Zhao
Mingwei Li
Matthew Berger
DiffM
36
2
0
11 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
39
6
0
10 Jun 2024
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
An-Chieh Cheng
Hongxu Yin
Yang Fu
Qiushan Guo
Ruihan Yang
Jan Kautz
Xiaolong Wang
Sifei Liu
LRM
64
48
0
03 Jun 2024
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations
Tiancheng Shen
Jun Hao Liew
Long Mai
Lu Qi
Jiashi Feng
Jiaya Jia
DiffM
35
1
0
31 May 2024
A Neurosymbolic Framework for Bias Correction in CNNs
Parth Padalkar
Natalia Slusarz
Ekaterina Komendantskaya
Gopal Gupta
43
0
0
24 May 2024
Open-Vocabulary SAM3D: Understand Any 3D Scene
Hanchen Tai
Qingdong He
Jiangning Zhang
Yijie Qian
Zhenyu Zhang
Xiaobin Hu
Yabiao Wang
Yong Liu
VLM
54
0
0
24 May 2024
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments
Yang Zhou
Long Quang
Carlos Nieto-Granda
Giuseppe Loianno
21
2
0
23 May 2024
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge
Yihe Tang
Lyne Tchapmi
Cem Gokmen
Chengshu Li
...
Miao Liu
Pengchuan Zhang
Ruohan Zhang
Fei-Fei Li
Jiajun Wu
VGen
50
6
0
15 May 2024
Navigating the Future of Federated Recommendation Systems with Foundation Models
Zhiwei Li
Guodong Long
Chunxu Zhang
Honglei Zhang
Jing Jiang
Chengqi Zhang
117
0
0
12 May 2024
Visual Language Model based Cross-modal Semantic Communication Systems
Feibo Jiang
Chuanguo Tang
Li Dong
Kezhi Wang
Kun Yang
Cunhua Pan
VLM
38
2
0
06 May 2024
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
Laksh Nanwani
Kumaraditya Gupta
Aditya Mathur
Swayam Agrawal
A. H. A. Hafez
K. M. Krishna
40
0
0
27 Apr 2024
Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models
Yuyan Shi
Jialu Ma
Jin Yang
Shasha Wang
Yichi Zhang
MedIm
VLM
21
2
0
20 Apr 2024
NTIRE 2024 Challenge on Image Super-Resolution (
×
\times
×
4): Methods and Results
Zheng Chen
Zongwei Wu
Eduard Zamfir
Kai Zhang
Yulun Zhang
...
Yan Luo
Yanyan Wei
Asif Hussain Khan
C. Micheloni
N. Martinel
SupR
44
33
0
15 Apr 2024
Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models
Ziwei Luo
Fredrik K. Gustafsson
Zheng Zhao
Jens Sjölund
Thomas B. Schon
VLM
35
11
0
15 Apr 2024
AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation
Rui Xie
Ying Tai
Chen Zhao
Kai Zhang
Zhenyu Zhang
Jun Zhou
Xiaoqian Ye
Qian Wang
Jian Yang
42
23
0
02 Apr 2024
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee
Gabriela Ben-Melech Stan
Estelle Aflalo
Sayak Paul
Dhruba Ghosh
...
Ludwig Schmidt
Hanna Hajishirzi
Vasudev Lal
Chitta Baral
Yezhou Yang
EGVM
VLM
59
15
0
01 Apr 2024
Previous
1
2
3
4
Next