Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00563
Cited By
Self-critical Sequence Training for Image Captioning
2 December 2016
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-critical Sequence Training for Image Captioning"
50 / 858 papers shown
Title
Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Peggy Tang
Junbin Gao
Lei Zhang
Zhiyong Wang
27
1
0
06 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
29
23
0
01 Jun 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
40
97
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
34
22
0
29 May 2023
S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts
Qi Chen
Yutong Xie
Biao Wu
Minh-Son To
James Ang
Qi Wu
18
1
0
26 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
44
21
0
25 May 2023
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Haoyi Qiu
Zi-Yi Dou
Tianlu Wang
Asli Celikyilmaz
Nanyun Peng
EGVM
26
14
0
24 May 2023
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
37
6
0
20 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
34
5
0
20 May 2023
BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases
Xin Liu
Muhammad Khalifa
Lu Wang
36
18
0
19 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
20
1
0
19 May 2023
Recent Trends in Unsupervised Summarization
Mohammad Khosravani
Amine Trabelsi
37
0
0
18 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
38
464
0
18 May 2023
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training
Kecheng Zhang
Jing Zhang
Jun Yu
Han Jiang
Jianping Fan
Qing-An Huang
Weidong Han
MedIm
38
29
0
13 May 2023
Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives
Bhanu Prakash Voutharoja
Lei Wang
Luping Zhou
MedIm
33
8
0
11 May 2023
Simple Token-Level Confidence Improves Caption Correctness
Suzanne Petryk
Spencer Whitehead
Joseph E. Gonzalez
Trevor Darrell
Anna Rohrbach
Marcus Rohrbach
31
7
0
11 May 2023
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese
Doanh C. Bui
Nghia Hieu Nguyen
Khang Phuoc-Quy Nguyen
VLM
27
3
0
07 May 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Wen-tau Yih
VGen
26
3
0
04 May 2023
Transforming Visual Scene Graphs to Image Captions
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
54
19
0
03 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
34
10
0
03 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
48
5
0
02 May 2023
A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering
Alireza Salemi
Juan Altmayer Pizzorno
Hamed Zamani
15
14
0
26 Apr 2023
Bridging Discrete and Backpropagation: Straight-Through and Beyond
Liyuan Liu
Chengyu Dong
Xiaodong Liu
Bin-Xia Yu
Jianfeng Gao
BDL
26
20
0
17 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
34
104
0
17 Apr 2023
ImageCaptioner
2
^2
2
: Image Captioner for Image Captioning Bias Amplification Assessment
Eslam Mohamed Bakr
Pengzhan Sun
Erran L. Li
Mohamed Elhoseiny
22
6
0
10 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
35
18
0
07 Apr 2023
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
22
8
0
07 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
27
19
0
04 Apr 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
27
31
0
28 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
44
2
0
19 Mar 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
69
74
0
10 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
50
86
0
06 Mar 2023
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Hui Liu
Xiaojun Wan
HILM
37
10
0
06 Mar 2023
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu
Hanzhuo Tan
Jing Li
Piji Li
24
7
0
26 Feb 2023
Metric-oriented Speech Enhancement using Diffusion Probabilistic Model
Chen Chen
Yuchen Hu
Weiwei Weng
Chng Eng Siong
DiffM
43
19
0
23 Feb 2023
Guiding Large Language Models via Directional Stimulus Prompting
Zekun Li
Baolin Peng
Pengcheng He
Michel Galley
Jianfeng Gao
Xi Yan
LLMAG
LRM
LM&Ro
40
95
0
22 Feb 2023
Designing a Wayfinding Robot for People with Visual Impairments
Shuijing Liu
Aamir Hasan
Kaiwen Hong
Chunpeng Yao
Justin Lin
Weihang Liang
M. Bayles
W. Rogers
Katherine Driggs-Campbell
138
1
0
17 Feb 2023
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
34
29
0
16 Feb 2023
Tuning computer vision models with task rewards
André Susano Pinto
Alexander Kolesnikov
Yuge Shi
Lucas Beyer
Xiaohua Zhai
VLM
32
40
0
16 Feb 2023
Towards Local Visual Modeling for Image Captioning
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Rongrong Ji
ViT
21
71
0
13 Feb 2023
See Your Heart: Psychological states Interpretation through Visual Creations
Likun Yang
Xiaokun Feng
Xiaotang Chen
Shiyu Zhang
Kaiqi Huang
16
0
0
11 Feb 2023
Ordered Memory Baselines
Daniel Borisov
Matthew D’Iorio
Jeffrey Hyacinthe
13
0
0
08 Feb 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Mozhgan Pourkeshavarz
Shahabedin Nabavi
Mohsen Moghaddam
M. Shamsfard
31
4
0
08 Feb 2023
An entity-guided text summarization framework with relational heterogeneous graph neural network
Jingqiang Chen
21
4
0
07 Feb 2023
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
42
26
0
01 Feb 2023
Do Multi-Document Summarization Models Synthesize?
Jay DeYoung
Stephanie C. Martinez
Iain J. Marshall
Byron C. Wallace
24
8
0
31 Jan 2023
Input Perturbation Reduces Exposure Bias in Diffusion Models
Mang Ning
E. Sangineto
Angelo Porrello
Simone Calderara
Rita Cucchiara
DiffM
36
64
0
27 Jan 2023
Style-Aware Contrastive Learning for Multi-Style Image Captioning
Yucheng Zhou
Guodong Long
25
22
0
26 Jan 2023
Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
SSL
VLM
27
4
0
26 Jan 2023
Embodied Agents for Efficient Exploration and Smart Scene Description
Roberto Bigazzi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
22
7
0
17 Jan 2023
Previous
1
2
3
4
5
...
16
17
18
Next