ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00563
  4. Cited By
Self-critical Sequence Training for Image Captioning
v1v2 (latest)

Self-critical Sequence Training for Image Captioning

2 December 2016
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
ArXiv (abs)PDFHTML

Papers citing "Self-critical Sequence Training for Image Captioning"

50 / 862 papers shown
Title
GSRFormer: Grounded Situation Recognition Transformer with Alternate
  Semantic Attention Refinement
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
77
37
0
18 Aug 2022
Exploiting Multiple Sequence Lengths in Fast End to End Training for
  Image Captioning
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
128
22
0
13 Aug 2022
Towards Sequence-Level Training for Visual Tracking
Towards Sequence-Level Training for Visual Tracking
Minji Kim
Seungkwang Lee
Jungseul Ok
Bohyung Han
Minsu Cho
78
34
0
11 Aug 2022
Attribute Controllable Beautiful Caucasian Face Generation by Aesthetics
  Driven Reinforcement Learning
Attribute Controllable Beautiful Caucasian Face Generation by Aesthetics Driven Reinforcement Learning
Xin Jin
Shu Zhao
Le Zhang
Xin Zhao
Qiang Deng
Chaoen Xiao
EGVMCVBM
53
2
0
09 Aug 2022
Distinctive Image Captioning via CLIP Guided Group Optimization
Distinctive Image Captioning via CLIP Guided Group Optimization
Youyuan Zhang
Jiuniu Wang
Hao Wu
Wenjia Xu
VLM
103
8
0
08 Aug 2022
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset
  for Multi-Modal Understanding
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding
Bingning Wang
Feiya Lv
Ting Yao
Yiming Yuan
Jin Ma
Yu Luo
Haijin Liang
73
3
0
05 Aug 2022
Retrieval-Augmented Transformer for Image Captioning
Retrieval-Augmented Transformer for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
98
59
0
26 Jul 2022
Innovations in Neural Data-to-text Generation: A Survey
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
105
10
0
25 Jul 2022
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning
  with Human Object Interactions
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions
Ansh Mittal
Shuvam Ghosal
Rishibha Bansal
113
3
0
24 Jul 2022
Rethinking the Reference-based Distinctive Image Captioning
Rethinking the Reference-based Distinctive Image Captioning
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
99
22
0
22 Jul 2022
Efficient Modeling of Future Context for Image Captioning
Efficient Modeling of Future Context for Image Captioning
Zhengcong Fei
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
76
15
0
22 Jul 2022
Robust Knowledge Adaptation for Dynamic Graph Neural Networks
Robust Knowledge Adaptation for Dynamic Graph Neural Networks
Han Li
Changsheng Li
Kaituo Feng
Ye Yuan
Guoren Wang
H. Zha
91
14
0
22 Jul 2022
GRIT: Faster and Better Image captioning Transformer Using Dual Visual
  Features
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
ViT
92
114
0
20 Jul 2022
Explicit Image Caption Editing
Explicit Image Caption Editing
Zhen Wang
Long Chen
Wenbo Ma
G. Han
Yulei Niu
Jian Shao
Jun Xiao
74
12
0
20 Jul 2022
Cross-modal Prototype Driven Network for Radiology Report Generation
Cross-modal Prototype Driven Network for Radiology Report Generation
Jun Wang
A. Bhalerao
Yulan He
MedIm
173
77
0
11 Jul 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep
  Reinforcement Learning
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
Guosheng Lin
SyDaALM
254
273
0
05 Jul 2022
Are metrics measuring what they should? An evaluation of image
  captioning task metrics
Are metrics measuring what they should? An evaluation of image captioning task metrics
Othón González-Chávez
Guillermo Ruiz
Daniela Moctezuma
Tania A. Ramirez-delreal
75
9
0
04 Jul 2022
Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray
  Report Generation
Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation
Sixing Yan
William K. Cheung
Keith W H Chiu
Terence M. Tong
Charles K. Cheung
Simon See
MedIm
94
17
0
04 Jul 2022
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer
  Using Patches
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches
Mengya Xu
Mobarakol Islam
Hongliang Ren
MedIm
93
12
0
30 Jun 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Jugal Kalita
107
0
0
28 Jun 2022
Joint Generator-Ranker Learning for Natural Language Generation
Joint Generator-Ranker Learning for Natural Language Generation
Weizhou Shen
Yeyun Gong
Yelong Shen
Song Wang
Xiaojun Quan
Nan Duan
Weizhu Chen
113
5
0
28 Jun 2022
Competence-based Multimodal Curriculum Learning for Medical Report
  Generation
Competence-based Multimodal Curriculum Learning for Medical Report Generation
Fenglin Liu
Shen Ge
Yuexian Zou
Xian Wu
MedIm
172
140
0
24 Jun 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Bypass Network for Semantics Driven Image Paragraph Captioning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
125
1
0
21 Jun 2022
DALL-E for Detection: Language-driven Compositional Image Synthesis for
  Object Detection
DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffMObjD
112
18
0
20 Jun 2022
Write and Paint: Generative Vision-Language Models are Unified Modal
  Learners
Write and Paint: Generative Vision-Language Models are Unified Modal Learners
Shizhe Diao
Wangchunshu Zhou
Xinsong Zhang
Jiawei Wang
MLLMAI4CE
108
17
0
15 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLMObjD
132
130
0
15 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
86
92
0
14 Jun 2022
Language Models are General-Purpose Interfaces
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
80
102
0
13 Jun 2022
Improving Image Captioning with Control Signal of Sentence Quality
Improving Image Captioning with Control Signal of Sentence Quality
Zhangzi Zhu
Hong Qu
96
0
0
07 Jun 2022
Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov
  Decision Processes
Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes
Tetsuro Morimura
Kazuhiro Ota
Kenshi Abe
Peinan Zhang
OffRL
79
0
0
02 Jun 2022
Learning as Conversation: Dialogue Systems Reinforced for Information
  Acquisition
Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition
Pengshan Cai
H. Wan
Fei Liu
Mo Yu
Hong-ye Yu
Sachindra Joshi
96
6
0
29 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
180
564
0
27 May 2022
Quark: Controllable Text Generation with Reinforced Unlearning
Quark: Controllable Text Generation with Reinforced Unlearning
Ximing Lu
Sean Welleck
Jack Hessel
Liwei Jiang
Lianhui Qin
Peter West
Prithviraj Ammanabrolu
Yejin Choi
MU
187
220
0
26 May 2022
Sparse Graph Learning from Spatiotemporal Time Series
Sparse Graph Learning from Spatiotemporal Time Series
Andrea Cini
Daniele Zambon
Cesare Alippi
CMLAI4TS
139
20
0
26 May 2022
Fine-grained Image Captioning with CLIP Reward
Fine-grained Image Captioning with CLIP Reward
Jaemin Cho
Seunghyun Yoon
Ajinkya Kale
Franck Dernoncourt
Trung Bui
Joey Tianyi Zhou
CLIP
242
79
0
26 May 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
Jinho Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
163
37
0
25 May 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
177
78
0
25 May 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
  Skip-connections
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLMMLLM
109
224
0
24 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
GL-RG: Global-Local Representation Granularity for Video Captioning
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
127
59
0
22 May 2022
Learning from Bootstrapping and Stepwise Reinforcement Reward: A
  Semi-Supervised Framework for Text Style Transfer
Learning from Bootstrapping and Stepwise Reinforcement Reward: A Semi-Supervised Framework for Text Style Transfer
Zhengyuan Liu
Nancy F. Chen
69
2
0
19 May 2022
Importance Weighted Structure Learning for Scene Graph Generation
Importance Weighted Structure Learning for Scene Graph Generation
Daqing Liu
M. Bober
J. Kittler
122
5
0
14 May 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation
  Datasets
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban
Chien-Sheng Wu
Wenhao Liu
Caiming Xiong
82
5
0
13 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its
  Effect on Visual Description Models and Metrics
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
80
6
0
12 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
122
44
0
12 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
95
16
0
11 May 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual
  Context for Image Captioning
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Chia-Wen Kuo
Z. Kira
100
56
0
09 May 2022
DxFormer: A Decoupled Automatic Diagnostic System Based on
  Decoder-Encoder Transformer with Dense Symptom Representations
DxFormer: A Decoupled Automatic Diagnostic System Based on Decoder-Encoder Transformer with Dense Symptom Representations
Wei Chen
Cheng Zhong
J. Peng
Zhongyu Wei
MedIm
69
18
0
08 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLMCLIPOffRL
362
1,315
0
04 May 2022
Improving Multi-Document Summarization through Referenced Flexible
  Extraction with Credit-Awareness
Improving Multi-Document Summarization through Referenced Flexible Extraction with Credit-Awareness
Yun-Zhu Song
Yi-Syuan Chen
Hong-Han Shuai
98
22
0
04 May 2022
Diverse Image Captioning with Grounded Style
Diverse Image Captioning with Grounded Style
Franz Klein
Shweta Mahajan
S. Roth
81
8
0
03 May 2022
Previous
123...567...161718
Next