Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00563
Cited By
Self-critical Sequence Training for Image Captioning
2 December 2016
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-critical Sequence Training for Image Captioning"
50 / 858 papers shown
Title
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search
Hyunseok Lee
Jeonghoon Kim
Beomjun Kim
Jihoon Tack
Chansong Jo
Jaehong Lee
Cheonbok Park
Sookyo In
Jinwoo Shin
Kang Min Yoo
16
0
0
21 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
Anatomical Attention Alignment representation for Radiology Report Generation
Quang Vinh Nguyen
Minh Duc Nguyen
Thanh Hoang Son Vo
Hyung-Jeong Yang
Soo-Hyung Kim
MedIm
28
0
0
12 May 2025
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Lakshita Agarwal
Bindu Verma
ViT
34
0
0
23 Apr 2025
Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning
Yassir Benhammou
Alessandro Tiberio
Gabriel Trautmann
Suman Kalyan
MLLM
VLM
48
0
0
21 Apr 2025
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
Sang-Jun Park
Keun-Soo Heo
Dong-Hee Shin
Young-Han Son
Ji-Hye Oh
Tae-Eui Kam
MedIm
39
0
0
16 Apr 2025
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Shuai Zhao
Linchao Zhu
Yi Yang
39
2
0
14 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
45
0
0
03 Apr 2025
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning
Maofu Liu
Jiahui Liu
Xiaokang Zhang
42
0
0
30 Mar 2025
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Zhaochong An
Guolei Sun
Yun Liu
Runjia Li
Junlin Han
Ender Konukoglu
Serge Belongie
VLM
65
0
0
20 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Sara Sarto
Marcella Cornia
Rita Cucchiara
48
0
0
18 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
70
0
0
11 Mar 2025
Measuring directional bias amplification in image captions using predictability
Rahul Nair
Bhanu Tokas
Neel Shah
Hannah Kerner
51
0
0
10 Mar 2025
SED2AM: Solving Multi-Trip Time-Dependent Vehicle Routing Problem using Deep Reinforcement Learning
Arash Mozhdehi
Yansen Wang
Sun Sun
Xin Eric Wang
AI4TS
68
0
0
06 Mar 2025
Q&C: When Quantization Meets Cache in Efficient Image Generation
Xin Ding
Xiaochen Li
Haotong Qin
Zhibo Chen
DiffM
MQ
75
0
0
04 Mar 2025
Group Relative Policy Optimization for Image Captioning
Xu Liang
48
1
0
03 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
82
0
0
03 Mar 2025
Improved Diffusion-based Generative Model with Better Adversarial Robustness
Zekun Wang
Mingyang Yi
Shuchen Xue
Zhiyu Li
Ming Liu
Bing Qin
Zhi-Ming Ma
DiffM
42
0
0
24 Feb 2025
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang
Yiren Jian
Z. Ouyang
Soroush Vosoughi
VLM
93
4
0
20 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
49
0
0
09 Feb 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
70
0
0
28 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
52
1
0
07 Jan 2025
Classifier-Guided Captioning Across Modalities
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
38
0
0
03 Jan 2025
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
77
2
0
01 Nov 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
64
1
0
29 Oct 2024
Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
Yun-Yen Chuang
Hung-Min Hsu
Kevin Lin
Chen-Sheng Gu
Ling Zhen Li
Ray-I Chang
Hung-yi Lee
DiffM
VLM
36
0
0
17 Oct 2024
Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference
William Thorne
Ambrose Robinson
Bohua Peng
Chenghua Lin
Diana Maynard
16
2
0
10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
45
3
0
09 Oct 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
186
1
0
04 Sep 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
31
3
0
26 Aug 2024
Shifted Window Fourier Transform And Retention For Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
VLM
42
0
0
25 Aug 2024
DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation
Xiaowei Mao
Yan Lin
Shengnan Guo
Yubin Chen
Xingyu Xian
Haomin Wen
Qisen Xu
Youfang Lin
Huaiyu Wan
47
1
0
23 Aug 2024
TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model
Yuhao Wang
Chao Hao
Yawen Cui
Xinqi Su
Weicheng Xie
Tao Tan
Zitong Yu
LM&MA
MedIm
41
0
0
22 Aug 2024
See It All: Contextualized Late Aggregation for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Seung Hwan Kim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
55
4
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
43
3
0
13 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
45
0
0
09 Aug 2024
e-Health CSIRO at RRG24: Entropy-Augmented Self-Critical Sequence Training for Radiology Report Generation
Aaron Nicolson
Jinghui Liu
Jason Dowling
Anthony N. Nguyen
Bevan Koopman
52
3
0
07 Aug 2024
Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation
Hui Ma
Bo Zhang
Bo Xu
Jian Wang
Hongfei Lin
Xiao Sun
57
1
0
06 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen
Ming Jiang
Qi Zhao
24
2
0
05 Aug 2024
Positive Text Reframing under Multi-strategy Optimization
Shutong Jia
Biwei Cao
Qingqing Gao
Jiuxin Cao
Bo Liu
31
1
0
25 Jul 2024
Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization
Jonathan Pirnay
D. G. Grimm
BDL
53
3
0
24 Jul 2024
Conversational Query Reformulation with the Guidance of Retrieved Documents
Jeonghyun Park
Hwanhee Lee
30
0
0
17 Jul 2024
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
Fanyue Wei
Wei Zeng
Zhenyang Li
Dawei Yin
Lixin Duan
Wen Li
EGVM
39
2
0
09 Jul 2024
Edge-DIRECT: A Deep Reinforcement Learning-based Method for Solving Heterogeneous Electric Vehicle Routing Problem with Time Window Constraints
Arash Mozhdehi
Mahdi Mohammadizadeh
Xin Eric Wang
35
0
0
28 Jun 2024
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It
Aaron Nicolson
Shengyao Zhuang
Jason Dowling
Bevan Koopman
34
1
0
19 Jun 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Miaosen Zhang
Yixuan Wei
Zhen Xing
Yifei Ma
Zuxuan Wu
...
Zheng-Wei Zhang
Qi Dai
Chong Luo
Xin Geng
Baining Guo
VLM
51
1
0
13 Jun 2024
Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
Wenliang Zhong
Wenyi Wu
Qi Li
Rob Barton
Boxin Du
Shioulin Sam
Karim Bouyarmane
Ismail B. Tutar
Junzhou Huang
35
3
0
05 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
40
0
0
01 Jun 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
39
0
0
27 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
32
10
0
21 May 2024
1
2
3
4
...
16
17
18
Next