ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00563
  4. Cited By
Self-critical Sequence Training for Image Captioning
v1v2 (latest)

Self-critical Sequence Training for Image Captioning

2 December 2016
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
ArXiv (abs)PDFHTML

Papers citing "Self-critical Sequence Training for Image Captioning"

50 / 862 papers shown
Title
Embodied Executable Policy Learning with Language-based Scene
  Summarization
Embodied Executable Policy Learning with Language-based Scene Summarization
Jielin Qiu
Mengdi Xu
William Jongwon Han
Seungwhan Moon
Ding Zhao
LM&Ro
86
8
0
09 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report
  Generation
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
97
11
0
09 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
125
158
0
07 Jun 2023
Policy-Based Self-Competition for Planning Problems
Policy-Based Self-Competition for Planning Problems
Jonathan Pirnay
Q. Göttl
Jakob Burger
D. G. Grimm
95
3
0
07 Jun 2023
Efficient and Interpretable Compressive Text Summarisation with
  Unsupervised Dual-Agent Reinforcement Learning
Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Peggy Tang
Junbin Gao
Lei Zhang
Zhiyong Wang
62
2
0
06 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
146
28
0
01 Jun 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
244
112
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in
  Vision-Language Models
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
113
23
0
29 May 2023
S4M: Generating Radiology Reports by A Single Model for Multiple Body
  Parts
S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts
Qi Chen
Yutong Xie
Biao Wu
Minh-Son To
James Ang
Qi Wu
56
3
0
26 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
87
23
0
25 May 2023
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Haoyi Qiu
Zi-Yi Dou
Tianlu Wang
Asli Celikyilmaz
Nanyun Peng
EGVM
129
16
0
24 May 2023
A request for clarity over the End of Sequence token in the
  Self-Critical Sequence Training
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
103
7
0
20 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
84
7
0
20 May 2023
BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases
BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases
Xin Liu
Muhammad Khalifa
Lu Wang
121
20
0
19 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
87
1
0
19 May 2023
Recent Trends in Unsupervised Summarization
Recent Trends in Unsupervised Summarization
Mohammad Khosravani
Amine Trabelsi
92
0
0
18 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for
  Vision-Centric Tasks
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLMVLM
123
494
0
18 May 2023
Multi-task Paired Masking with Alignment Modeling for Medical
  Vision-Language Pre-training
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training
Kecheng Zhang
Shuai Liu
Jun Yu
Han Jiang
Jianping Fan
Qing-An Huang
Weidong Han
MedIm
105
33
0
13 May 2023
Automatic Radiology Report Generation by Learning with Increasingly Hard
  Negatives
Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives
Bhanu Prakash Voutharoja
Lei Wang
Luping Zhou
MedIm
63
8
0
11 May 2023
Simple Token-Level Confidence Improves Caption Correctness
Simple Token-Level Confidence Improves Caption Correctness
Suzanne Petryk
Spencer Whitehead
Joseph E. Gonzalez
Trevor Darrell
Anna Rohrbach
Marcus Rohrbach
94
7
0
11 May 2023
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in
  Vietnamese
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese
Doanh C. Bui
Nghia Hieu Nguyen
Khang Phuoc-Quy Nguyen
VLM
72
3
0
07 May 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Wen-tau Yih
VGen
58
3
0
04 May 2023
Transforming Visual Scene Graphs to Image Captions
Transforming Visual Scene Graphs to Image Captions
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
103
21
0
03 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
83
10
0
03 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression
  loss acts as a regularizer
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
92
5
0
02 May 2023
A Symmetric Dual Encoding Dense Retrieval Framework for
  Knowledge-Intensive Visual Question Answering
A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering
Alireza Salemi
Juan Altmayer Pizzorno
Hamed Zamani
38
15
0
26 Apr 2023
Bridging Discrete and Backpropagation: Straight-Through and Beyond
Bridging Discrete and Backpropagation: Straight-Through and Beyond
Liyuan Liu
Chengyu Dong
Xiaodong Liu
Bin Yu
Jianfeng Gao
BDL
92
23
0
17 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
144
112
0
17 Apr 2023
ImageCaptioner$^2$: Image Captioner for Image Captioning Bias
  Amplification Assessment
ImageCaptioner2^22: Image Captioner for Image Captioning Bias Amplification Assessment
Eslam Mohamed Bakr
Pengzhan Sun
Erran L. Li
Mohamed Elhoseiny
58
6
0
10 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Model-Agnostic Gender Debiased Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
127
18
0
07 Apr 2023
Graph Attention for Automated Audio Captioning
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
77
8
0
07 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
101
19
0
04 Apr 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology
  Report Generation
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
98
33
0
28 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
137
2
0
19 Mar 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIPMLLMVLM3DV
162
77
0
10 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
115
89
0
06 Mar 2023
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Hui Liu
Xiaojun Wan
HILM
71
11
0
06 Mar 2023
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu
Hanzhuo Tan
Jing Li
Piji Li
95
8
0
26 Feb 2023
Metric-oriented Speech Enhancement using Diffusion Probabilistic Model
Metric-oriented Speech Enhancement using Diffusion Probabilistic Model
Chen Chen
Yuchen Hu
Weiwei Weng
Chng Eng Siong
DiffM
97
21
0
23 Feb 2023
Guiding Large Language Models via Directional Stimulus Prompting
Guiding Large Language Models via Directional Stimulus Prompting
Zekun Li
Baolin Peng
Pengcheng He
Michel Galley
Jianfeng Gao
Xi Yan
LLMAGLRMLM&Ro
147
101
0
22 Feb 2023
Designing a Wayfinding Robot for People with Visual Impairments
Designing a Wayfinding Robot for People with Visual Impairments
Shuijing Liu
Aamir Hasan
Kaiwen Hong
Chunpeng Yao
Justin Lin
Weihang Liang
M. Bayles
W. Rogers
Katherine Driggs-Campbell
336
1
0
17 Feb 2023
Retrieval-augmented Image Captioning
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
82
29
0
16 Feb 2023
Tuning computer vision models with task rewards
Tuning computer vision models with task rewards
André Susano Pinto
Alexander Kolesnikov
Yuge Shi
Lucas Beyer
Xiaohua Zhai
VLM
87
41
0
16 Feb 2023
Towards Local Visual Modeling for Image Captioning
Towards Local Visual Modeling for Image Captioning
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Rongrong Ji
ViT
107
79
0
13 Feb 2023
See Your Heart: Psychological states Interpretation through Visual
  Creations
See Your Heart: Psychological states Interpretation through Visual Creations
Likun Yang
Xiaokun Feng
Xiaotang Chen
Shiyu Zhang
Kaiqi Huang
27
0
0
11 Feb 2023
Ordered Memory Baselines
Ordered Memory Baselines
Daniel Borisov
Matthew D’Iorio
Jeffrey Hyacinthe
62
0
0
08 Feb 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image
  Captioning
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Mozhgan Pourkeshavarz
Shahabedin Nabavi
Mohsen Moghaddam
M. Shamsfard
88
4
0
08 Feb 2023
An entity-guided text summarization framework with relational
  heterogeneous graph neural network
An entity-guided text summarization framework with relational heterogeneous graph neural network
Jingqiang Chen
75
6
0
07 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
109
32
0
01 Feb 2023
Do Multi-Document Summarization Models Synthesize?
Do Multi-Document Summarization Models Synthesize?
Jay DeYoung
Stephanie C. Martinez
Iain J. Marshall
Byron C. Wallace
111
8
0
31 Jan 2023
Previous
12345...161718
Next