Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00370
Cited By
Improved Image Captioning via Policy Gradient optimization of SPIDEr
1 December 2016
Siqi Liu
Zhenhai Zhu
Ning Ye
S. Guadarrama
Kevin Patrick Murphy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improved Image Captioning via Policy Gradient optimization of SPIDEr"
50 / 81 papers shown
Title
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
105
4
0
12 Feb 2025
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
65
0
0
14 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
39
0
0
08 Oct 2024
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
27
38
0
11 Dec 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
33
12
0
09 Oct 2023
ContextRef: Evaluating Referenceless Metrics For Image Description Generation
Elisa Kreiss
E. Zelikman
Christopher Potts
Nick Haber
29
5
0
21 Sep 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
32
6
0
23 Aug 2023
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
32
6
0
20 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
34
5
0
20 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
48
5
0
02 May 2023
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
22
8
0
07 Apr 2023
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
21
42
0
30 Mar 2023
ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration Systems for Blind and Low Vision Users
Vishnu Nair
Han Zhu
Brian A. Smith
10
17
0
17 Feb 2023
Semantics-Empowered Communication: A Tutorial-cum-Survey
Zhilin Lu
Rongpeng Li
Kun Lu
Xianfu Chen
Ekram Hossain
Zhifeng Zhao
Honggang Zhang
39
19
0
16 Dec 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
33
3
0
10 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
240
0
03 Oct 2022
Paraphrasing Is All You Need for Novel Object Captioning
Cheng Yang
Yao-Hung Hubert Tsai
Wanshu Fan
Ruslan Salakhutdinov
Louis-Philippe Morency
Yu-Chiang Frank Wang
38
4
0
25 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
36
10
0
21 Sep 2022
An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan
Sheng-Wei Li
26
0
0
12 Aug 2022
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
29
37
0
12 May 2022
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
26
21
0
29 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
33
29
0
06 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
38
27
0
21 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
25
89
0
31 Jan 2022
A Survey of Natural Language Generation
Chenhe Dong
Hai-Tao Zheng
Haifan Gong
Mengzhao Chen
Junxin Li
Ying Shen
Min Yang
3DV
27
43
0
22 Dec 2021
Audio Captioning Using Sound Event Detection
Aycsegul Ozkaya Eren
M. Sert
43
8
0
04 Oct 2021
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning
Guangyi Liu
Yinghong Liao
Fuyu Wang
Bin Zhang
Lu Zhang
...
Xiang Wan
Shaolin Li
Zhen Li
Shuixing Zhang
Shuguang Cui
23
56
0
11 Aug 2021
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
Andrew Koh
Fuzhao Xue
Chng Eng Siong
14
20
0
10 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
254
0
14 Jul 2021
Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"
Jia-Hong Huang
Ting-Wei Wu
Chao-Han Huck Yang
M. Worring
MedIm
20
28
0
30 May 2021
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
Guanghui Xu
Shuaicheng Niu
Mingkui Tan
Yucheng Luo
Qing Du
Qi Wu
DiffM
22
56
0
23 Apr 2021
Image Captioning using Multiple Transformers for Self-Attention Mechanism
Farrukh Olimov
Shikha Dubey
Labina Shrestha
Tran Trung Tin
M. Jeon
ViT
34
2
0
14 Feb 2021
Image Captioning with Context-Aware Auxiliary Guidance
Zeliang Song
Xiaofei Zhou
Zhendong Mao
Jianlong Tan
36
31
0
10 Dec 2020
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru
Han Guo
Joey Tianyi Zhou
OffRL
32
6
0
15 Nov 2020
Dual Attention on Pyramid Feature Maps for Image Captioning
Litao Yu
Jian Zhang
Qiang Wu
24
47
0
02 Nov 2020
WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information
An Tran
K. Drossos
Tuomas Virtanen
39
19
0
21 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
118
31
0
16 Oct 2020
Teacher-Critical Training Strategies for Image Captioning
Yiqing Huang
Jiansheng Chen
VLM
29
8
0
30 Sep 2020
Where is the Model Looking At?--Concentrate and Explain the Network Attention
Wenjia Xu
Jiuniu Wang
Yang Wang
Guangluan Xu
Wei Dai
Yirong Wu
XAI
29
17
0
29 Sep 2020
Towards Unique and Informative Captioning of Images
Zeyu Wang
Berthy Feng
Karthik Narasimhan
Olga Russakovsky
25
37
0
08 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
33
230
0
27 Aug 2020
Assisting Scene Graph Generation with Self-Supervision
Sandeep Inuganti
V. Balasubramanian
SSL
16
7
0
08 Aug 2020
A Unified Framework of Surrogate Loss by Refactoring and Interpolation
Lanlan Liu
Mingzhe Wang
Jia Deng
22
8
0
27 Jul 2020
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
376
0
26 Jun 2020
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation
Mingjie Li
Fuyu Wang
Xiaojun Chang
Xiaodan Liang
MedIm
29
101
0
06 Jun 2020
Chat as Expected: Learning to Manipulate Black-box Neural Dialogue Models
Haochen Liu
Zhiwei Wang
Tyler Derr
Jiliang Tang
AAML
22
15
0
27 May 2020
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
226
22
0
22 Mar 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
27
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
25
16
0
15 Feb 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
22
19
0
17 Jan 2020
1
2
Next