Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05657
Cited By
Tag2Text: Guiding Vision-Language Model via Image Tagging
10 March 2023
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tag2Text: Guiding Vision-Language Model via Image Tagging"
18 / 18 papers shown
Title
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
Y. Li
Jingyang Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVM
VGen
73
0
0
07 Apr 2025
FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
Yunlong Yuan
Yuanfan Guo
Chunwei Wang
Wei Zhang
Hang Xu
L. Zhang
DiffM
VGen
119
1
0
20 Feb 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
94
0
0
25 Jan 2025
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Jiachen Li
Qian Long
Jian Zheng
Xiaofeng Gao
Robinson Piramuthu
Wenhu Chen
William Yang Wang
VGen
29
22
0
08 Oct 2024
Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing
Ekaterina Iakovleva
Fabio Pizzati
Philip H. S. Torr
Stéphane Lathuiliere
DiffM
31
0
0
29 Jul 2024
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags
Daiqing Qi
Handong Zhao
Zijun Wei
Sheng Li
46
2
0
16 Jun 2024
Object Recognition as Next Token Prediction
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
40
9
0
04 Dec 2023
Open-Set Image Tagging with Multi-Grained Text Supervision
Xinyu Huang
Yi-Jie Huang
Youcai Zhang
Weiwei Tian
Rui Feng
Yuejie Zhang
Yanchun Xie
Yaqian Li
Lei Zhang
VLM
30
28
0
23 Oct 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLM
ELM
32
502
0
30 Jul 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
45
189
0
12 Jun 2023
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions
Qian Wang
Biao Zhang
Michael Birsak
Peter Wonka
DiffM
30
31
0
29 May 2023
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
Zhimin Chen
Longlong Jing
Yingwei Li
Bing Li
29
31
0
15 May 2023
Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Bo Ren
Shutao Xia
VLM
75
49
0
05 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
319
11,953
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
293
1,084
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
313
3,708
0
11 Feb 2021
PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection
Yue Liao
Si Liu
Fei-Yue Wang
Yanjie Chen
Chen Qian
Jiashi Feng
71
264
0
30 Dec 2019
1