Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.03099
Cited By
Semantic-Conditional Diffusion Networks for Image Captioning
6 December 2022
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Semantic-Conditional Diffusion Networks for Image Captioning"
25 / 25 papers shown
Title
Generalized Visual Relation Detection with Diffusion Models
Kaifeng Gao
Siqi Chen
Hanwang Zhang
Jun Xiao
Yueting Zhuang
Qianru Sun
40
0
0
16 Apr 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
71
0
0
13 Mar 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
53
0
0
03 Jan 2025
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
Hao Wu
Zhihang Zhong
Xiao Sun
DiffM
75
0
0
02 Dec 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
35
0
0
09 Aug 2024
MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
Youngmin Oh
Hyung-Il Kim
Seong Tae Kim
Jung Uk Kim
DiffM
34
2
0
23 Jul 2024
A Comprehensive Survey on Diffusion Models and Their Applications
M. Ahsan
S. Raman
Yingtao Liu
Zahed Siddique
MedIm
DiffM
41
1
0
01 Jul 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
34
0
0
01 Jun 2024
Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images
Xiaofei Yu
Yitong Li
Jie Ma
DiffM
52
0
0
21 May 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Yuchi Wang
Shuhuai Ren
Rundong Gao
Linli Yao
Qingyan Guo
Kaikai An
Jianhong Bai
Xu Sun
DiffM
VLM
49
6
0
16 Apr 2024
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian
Qi Cai
Yingwei Pan
Yehao Li
Ting Yao
Qibin Sun
Tao Mei
DiffM
37
19
0
26 Mar 2024
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu
Yingwei Pan
Yehao Li
Ting Yao
Zhenglong Sun
Tao Mei
C. Chen
50
24
0
25 Mar 2024
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen
Yingwei Pan
Haibo Yang
Ting Yao
Tao Mei
DiffM
42
18
0
25 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
54
10
0
12 Mar 2024
Recurrent Aligned Network for Generalized Pedestrian Trajectory Prediction
Yonghao Dong
Le Wang
Sanpin Zhou
Gang Hua
Changyin Sun
37
5
0
09 Mar 2024
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Fuwen Luo
Chi Chen
Zihao Wan
Zhaolu Kang
Qidong Yan
...
Xiaoyue Mi
Peng Li
Ning Ma
Maosong Sun
Yang Liu
43
5
0
21 Feb 2024
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
32
10
0
14 Dec 2023
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Zhen Wang
Xinyun Jiang
Jun Xiao
Tao Chen
Long Chen
DiffM
27
1
0
25 Nov 2023
A Systematic Review of Deep Learning-based Research on Radiology Report Generation
Chang Liu
Yuanhe Tian
Yan Song
MedIm
34
15
0
23 Nov 2023
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Sijin Chen
Hongyuan Zhu
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
19
18
0
06 Sep 2023
Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language
Francesco Taioli
Federico Cunico
Federico Girella
Riccardo Bologna
Alessandro Farinelli
Marco Cristani
23
7
0
17 Aug 2023
Any-to-Any Generation via Composable Diffusion
Zineng Tang
Ziyi Yang
Chenguang Zhu
Michael Zeng
Joey Tianyi Zhou
VGen
DiffM
33
171
0
19 May 2023
Diffusion Models for Non-autoregressive Text Generation: A Survey
Yifan Li
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
MedIm
DiffM
45
32
0
12 Mar 2023
OSIC: A New One-Stage Image Captioner Coined
Bo Wang
Zhao Zhang
Ming Zhao
Xiaojie Jin
Mingliang Xu
Meng Wang
VLM
25
3
0
04 Nov 2022
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
244
344
0
22 Sep 2021
1