What value do explicit high level concepts have in vision to language problems?

3 June 2015

Qi Wu

Chunhua Shen

Lingqiao Liu

A. Dick

Anton Van Den Hengel

ArXiv PDF HTML

Papers citing "What value do explicit high level concepts have in vision to language problems?"

50 / 56 papers shown

Title
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags Daiqing Qi Handong Zhao Zijun Wei Sheng Li 46 2 0 16 Jun 2024
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning Mozhgan Pourkeshavarz Shahabedin Nabavi Mohsen Moghaddam M. Shamsfard 31 4 0 08 Feb 2023
An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU) Rana Adnan Ahmad Muhammad Azhar Hina Sattar 26 10 0 06 Jan 2023
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation Jie Ruan Yue Wu Xiaojun Wan Yuesheng Zhu 29 1 0 20 Nov 2022
FACT: Learning Governing Abstractions Behind Integer Sequences Peter Belcak Ard Kastrati Flavio Schenker Roger Wattenhofer 38 5 0 20 Sep 2022
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception Keenan I. Jones Enes ALTUNCU V. N. Franqueira Yi-Chia Wang Shujun Li DeLMO 39 3 0 11 Aug 2022
Deep Learning Approaches on Image Captioning: A Review Taraneh Ghandi H. Pourreza H. Mahyar VLM 22 89 0 31 Jan 2022
An Integrated Approach for Video Captioning and Applications Soheyla Amirian T. Taha Khaled Rasheed H. Arabnia 31 1 0 23 Jan 2022
A Survey of Natural Language Generation Chenhe Dong Hai-Tao Zheng Haifan Gong Mengzhao Chen Junxin Li Ying Shen Min Yang 3DV 27 43 0 22 Dec 2021
ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition Xinyu Wang Min Gui Yong-jia Jiang Zixia Jia Nguyen Bach Tao Wang Zhongqiang Huang Fei Huang Kewei Tu 44 52 0 13 Dec 2021
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language Mingyu Ding Zhenfang Chen Tao Du Ping Luo J. Tenenbaum Chuang Gan VGen PINN OCL 30 74 0 28 Oct 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 67 254 0 14 Jul 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning Zhenfang Chen Jiayuan Mao Jiajun Wu Kwan-Yee K. Wong J. Tenenbaum Chuang Gan VGen 36 92 0 30 Mar 2021
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings Yue Wang Jing Li M. Lyu Irwin King 11 16 0 03 Nov 2020
Teacher-Critical Training Strategies for Image Captioning Yiqing Huang Jiansheng Chen VLM 29 8 0 30 Sep 2020
Improving Image Captioning with Better Use of Captions Zhan Shi Xu Zhou Xipeng Qiu Xiao-Dan Zhu 30 122 0 21 Jun 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework C. Sur 27 7 0 16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC) C. Sur 25 16 0 15 Feb 2020
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue X. Jiang Jiahao Yu Zengchang Qin Yingying Zhuang Xingxing Zhang Yue Hu Qi Wu 23 70 0 17 Nov 2019
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style Hongwei Ge Zehang Yan Kai Zhang Mingde Zhao Liang Sun 30 24 0 15 Oct 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations Fenglin Liu Yuanxin Liu Xuancheng Ren Xiaodong He Xu Sun VLM 31 81 0 15 May 2019
Pointing Novel Objects in Image Captioning Yehao Li Ting Yao Yingwei Pan Hongyang Chao Tao Mei 33 69 0 25 Apr 2019
Multi-modal gated recurrent units for image description Xuelong Li Aihong Yuan Xiaoqiang Lu GAN 21 26 0 20 Apr 2019
Describing like humans: on diversity in image captioning Qingzhong Wang Antoni B. Chan 24 98 0 28 Mar 2019
Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification Xiu-Shen Wei Chen-Da Liu-Zhang Lingqiao Liu Chunhua Shen Jianxin Wu 19 43 0 11 Dec 2018
A Comprehensive Survey of Deep Learning for Image Captioning Md Zakir Hossain Ferdous Sohel M. Shiratuddin Hamid Laga VLM 3DV 45 760 0 06 Oct 2018
simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions Fenglin Liu Xuancheng Ren Yuanxin Liu Houfeng Wang Xu Sun 98 65 0 27 Aug 2018
Distinctive-attribute Extraction for Image Captioning Boeun Kim Young Han Lee Hyedong Jung C. Cho 19 6 0 25 Jul 2018
Topic-Guided Attention for Image Captioning Zhihao Zhu Zhan Xue Zejian Yuan 30 23 0 10 Jul 2018
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering Pan Lu Lei Ji Wei Zhang Nan Duan M. Zhou Jianyong Wang CoGe 25 79 0 24 May 2018
Object Counts! Bringing Explicit Detections Back into Image Captioning Josiah Wang Pranava Madhyastha Lucia Specia ObjD 19 37 0 23 Apr 2018
Learning to Guide Decoding for Image Captioning Wenhao Jiang Lin Ma Xinpeng Chen Hanwang Zhang Wei Liu 16 69 0 03 Apr 2018
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions Qing Li Qingyi Tao Chenyu You Jianfei Cai Jiebo Luo 34 106 0 20 Mar 2018
Neural Aesthetic Image Reviewer Wenshan Wang Su Yang Weishan Zhang Jiulong Zhang 22 38 0 28 Feb 2018
Disjoint Multi-task Learning between Heterogeneous Human-centric Tasks Dong-Jin Kim Jinsoo Choi Tae-Hyun Oh Youngjin Yoon In So Kweon 24 27 0 14 Feb 2018
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions Qing Li Jianlong Fu D. Yu Tao Mei Jiebo Luo FAtt XAI CoGe 51 60 0 27 Jan 2018
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning Hongge Chen Huan Zhang Pin-Yu Chen Jinfeng Yi Cho-Jui Hsieh GAN AAML 35 49 0 06 Dec 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning Yang Xian Yingli Tian VLM 25 22 0 15 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation Chuang Gan Yandong Li Haoxiang Li Chen Sun Boqing Gong 27 126 0 15 Aug 2017
Fluency-Guided Cross-Lingual Image Captioning Weiyu Lan Xirong Li Jianfeng Dong 19 93 0 15 Aug 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering Y. Jang Yale Song Youngjae Yu Youngjin Kim Gunhee Kim 34 546 0 14 Apr 2017
Recurrent Models for Situation Recognition Arun Mallya Svetlana Lazebnik 20 30 0 18 Mar 2017
MAT: A Multimodal Attentive Translator for Image Captioning Chang Liu F. Sun Changhu Wang Feng Wang Alan Yuille 20 58 0 18 Feb 2017
An Empirical Study of Language CNN for Image Captioning Jiuxiang Gu G. Wang Jianfei Cai Tsuhan Chen 31 132 0 21 Dec 2016
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions Peng Wang Qi Wu Chunhua Shen Anton Van Den Hengel OOD 32 86 0 16 Dec 2016
Areas of Attention for Image Captioning M. Pedersoli Thomas Lucas Cordelia Schmid Jakob Verbeek 33 205 0 03 Dec 2016
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson Basura Fernando Mark Johnson Stephen Gould 21 232 0 02 Dec 2016
Semantic Regularisation for Recurrent Image Annotation Feng Liu Tao Xiang Timothy M. Hospedales Wankou Yang Changyin Sun 29 103 0 16 Nov 2016
Boosting Image Captioning with Attributes Ting Yao Yingwei Pan Yehao Li Zhaofan Qiu Tao Mei VLM 48 620 0 05 Nov 2016
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering Youngjae Yu Hyungjin Ko Jongwook Choi Gunhee Kim 14 230 0 10 Oct 2016