Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.02655
Cited By
Language Models Can See: Plugging Visual Controls in Text Generation
5 May 2022
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models Can See: Plugging Visual Controls in Text Generation"
50 / 83 papers shown
Title
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
Mingkai Tian
Guorong Li
Yuankai Qi
Amin Beheshti
Javen Qinfeng Shi
Anton van den Hengel
Qingming Huang
VGen
39
0
0
31 Mar 2025
Fine-Grained Video Captioning through Scene Graph Consolidation
Sanghyeok Chu
Seonguk Seo
Bohyung Han
61
1
0
23 Feb 2025
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Yuqi Pang
Bowen Yang
Haoqin Tu
Yun Cao
Zeyu Zhang
LRM
MLLM
66
0
0
17 Feb 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
58
0
0
03 Jan 2025
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
72
2
0
14 Nov 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
51
0
0
31 Oct 2024
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
Joshua Forster Feinglass
Yezhou Yang
36
0
0
30 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIP
VLM
31
0
0
26 Sep 2024
From Experts to the Public: Governing Multimodal Language Models in Politically Sensitive Video Analysis
Tanusree Sharma
Yujin Potter
Zachary Kilhoffer
Yun Huang
Dawn Song
Yang Wang
59
3
0
15 Sep 2024
Data Augmentation for Image Classification using Generative AI
Fazle Rahat
M Shifat Hossain
Md Rubel Ahmed
Sumit Kumar Jha
Rickard Ewetz
VLM
59
4
0
31 Aug 2024
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
41
1
0
16 Jul 2024
Multimodal Reasoning with Multimodal Knowledge Graph
Junlin Lee
Yequan Wang
Jing Li
Min Zhang
44
15
0
04 Jun 2024
Multi-modal Generation via Cross-Modal In-Context Learning
Amandeep Kumar
Muzammal Naseer
Sanath Narayan
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
MLLM
56
0
0
28 May 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
43
0
0
28 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
41
14
0
06 Mar 2024
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation
Chen Xu
Tian Lan
Changlong Yu
Wei Wang
Jun Gao
...
Qunxi Dong
Kun Qian
Piji Li
Wei Bi
Bin Hu
50
0
0
04 Mar 2024
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks
Yang Liu
Xiaomin Yu
Gongyu Zhang
Christos Bergeles
Prokar Dasgupta
Alejandro Granados
Sebastien Ourselin
48
2
0
27 Feb 2024
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
Yuhui Zhang
Elaine Sui
Serena Yeung-Levy
44
9
0
16 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
51
3
0
04 Jan 2024
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
43
10
0
14 Dec 2023
Explaining CLIP's performance disparities on data from blind/low vision users
Daniela Massiceti
Camilla Longden
Agnieszka Slowik
Samuel Wills
Martin Grayson
C. Morrison
VLM
29
9
0
29 Nov 2023
Boosting Audio-visual Zero-shot Learning with Large Language Models
Haoxing Chen
Yaohui Li
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
Jun Lan
Huijia Zhu
Weiqiang Wang
VLM
51
1
0
21 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
32
10
0
14 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
47
2
0
08 Nov 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
39
0
0
25 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
35
1
0
15 Oct 2023
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
46
461
0
11 Sep 2023
Zero-Shot Audio Captioning via Audibility Guidance
Tal Shaharabany
Ariel Shaulov
Lior Wolf
28
4
0
07 Sep 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
46
13
0
25 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
34
13
0
23 Aug 2023
Text-Only Training for Visual Storytelling
Yuechen Wang
Wen-gang Zhou
Zhenbo Lu
Houqiang Li
DiffM
28
2
0
17 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
32
10
0
02 Aug 2023
Guiding Image Captioning Models Toward More Specific Captions
Simon Kornblith
Lala Li
Zirui Wang
Thao Nguyen
32
15
0
31 Jul 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
36
34
0
31 Jul 2023
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles
Haoqin Tu
Bowen Yang
Xianfeng Zhao
29
6
0
29 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLM
AuLLM
31
161
0
15 Jun 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
35
7
0
30 May 2023
Image Captioning with Multi-Context Synthetic Data
Feipeng Ma
Y. Zhou
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
DiffM
38
7
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
34
22
0
29 May 2023
Learning to Imagine: Visually-Augmented Natural Language Generation
Tianyi Tang
Yushuo Chen
Yifan Du
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
DiffM
29
9
0
26 May 2023
PandaGPT: One Model To Instruction-Follow Them All
Yixuan Su
Tian Lan
Huayang Li
Jialu Xu
Yan Wang
Deng Cai
MLLM
34
277
0
25 May 2023
Parts of Speech-Grounded Subspaces in Vision-Language Models
James Oldfield
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Ioannis Patras
26
9
0
23 May 2023
Preconditioned Visual Language Inference with Weak Supervision
Ehsan Qasemi
Amani Maina-Kilaas
Devadutta Dash
Khalid Alsaggaf
Muhao Chen
38
0
0
22 May 2023
Multimodal Procedural Planning via Dual Text-Image Prompting
Yujie Lu
Pan Lu
Zhiyu Zoey Chen
Wanrong Zhu
Xinze Wang
William Yang Wang
LM&Ro
64
43
0
02 May 2023
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
Zhaoyan Liu
Noël Vouitsis
S. Gorti
Jimmy Ba
G. Loaiza-Ganem
ViT
33
1
0
26 Apr 2023
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Junyan Wang
Ming Yan
Yi Zhang
Jitao Sang
CLIP
VLM
24
8
0
26 Apr 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
47
12
0
21 Mar 2023
Architext: Language-Driven Generative Architecture Design
Theodoros Galanos
Antonios Liapis
Georgios N. Yannakakis
VLM
AI4CE
26
6
0
13 Mar 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David Clifton
36
9
0
11 Mar 2023
Learning Combinatorial Prompts for Universal Controllable Image Captioning
Zhen Wang
Jun Xiao
Yueting Zhuang
Fei Gao
Jian Shao
Long Chen
60
5
0
11 Mar 2023
1
2
Next