Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

9 March 2016

Qi Wu

Chunhua Shen

Anton Van Den Hengel

Peng Wang

A. Dick

ArXiv PDF HTML

Papers citing "Image Captioning and Visual Question Answering Based on Attributes and External Knowledge"

34 / 34 papers shown

Title
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation Junrong Yue Wenjie Qu Chuan Qin Jing Chen Xiaomin Lie Xinlei Yu Wenxin Zhang Zhendong Zhao 54 0 0 23 Apr 2025
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation Christel Chappuis Eliot Walt Vincent Mendez Sylvain Lobry B. L. Saux D. Tuia 31 3 0 28 Nov 2023
Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition David M. Chan Shalini Ghosh Ariya Rastrow Björn Hoffmeister OffRL 18 6 0 06 Jan 2023
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes Shivam Sharma Siddhant Agarwal Tharun Suresh Preslav Nakov Md. Shad Akhtar Tanmoy Charkraborty VLM 28 18 0 01 Dec 2022
A survey on the development status and application prospects of knowledge graph in smart grids Jian Wang Xi Wang Chaoqun Ma Lei Kou 33 74 0 02 Nov 2022
Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval Xuri Ge Fuhai Chen Songpei Xu Fuxiang Tao J. Jose 30 26 0 17 Oct 2022
Generating image captions with external encyclopedic knowledge S. Nikiforova Tejaswini Deoskar Denis Paperno Yoad Winter 30 1 0 10 Oct 2022
Image Captioning based on Feature Refinement and Reflective Decoding G. Alabduljabbar Hafida Benhidour Said Kerrache 3DV 22 3 0 16 Jun 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios Guangyao Li Yake Wei Yapeng Tian Chenliang Xu Ji-Rong Wen Di Hu 29 136 0 26 Mar 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning J. Tan Y. Tan C. Chan Joon Huang Chuah VLM ViT 26 15 0 11 Feb 2022
Knowledge-based Embodied Question Answering Sinan Tan Mengmeng Ge Di Guo Huaping Liu F. Sun 30 20 0 16 Sep 2021
DAFNe: A One-Stage Anchor-Free Approach for Oriented Object Detection Steven Lang Fabrizio G. Ventola Kristian Kersting 34 14 0 13 Sep 2021
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation Zechen Bai Yuta Nakashima Noa Garcia 68 43 0 13 Sep 2021
Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration Xuan Kan Hejie Cui Carl Yang 76 40 0 11 Jul 2021
Image-to-Image Retrieval by Learning Similarity between Scene Graphs Sangwoong Yoon Woo-Young Kang Sungwook Jeon SeongEun Lee C. Han Jonghun Park Eun-Sol Kim 3DH 29 39 0 29 Dec 2020
Dual ResGCN for Balanced Scene GraphGeneration Jingyi Zhang Yong Zhang Baoyuan Wu Yanbo Fan Fumin Shen Heng Tao Shen 28 12 0 09 Nov 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review Wei Chen Weiping Wang Li Liu M. Lew VLM 118 31 0 16 Oct 2020
Length-Controllable Image Captioning Chaorui Deng Ning Ding Mingkui Tan Qi Wu VLM 33 56 0 19 Jul 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework C. Sur 27 7 0 16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC) C. Sur 25 16 0 15 Feb 2020
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning Filippos Gouidis Alexandros Vassiliades T. Patkos Antonis Argyros Nick Bassiliades Dimitris Plexousakis OCL 29 12 0 26 Dec 2019
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue X. Jiang Jiahao Yu Zengchang Qin Yingying Zhuang Xingxing Zhang Yue Hu Qi Wu 23 70 0 17 Nov 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods Aditya Mogadala M. Kalimuthu Dietrich Klakow VLM 20 132 0 22 Jul 2019
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback Hui Wu Yupeng Gao Xiaoxiao Guo Ziad Al-Halah Steven J. Rennie Kristen Grauman Rogerio Feris EgoV 25 63 0 30 May 2019
Object Detection in 20 Years: A Survey Zhengxia Zou Keyan Chen Zhenwei Shi Yuhong Guo Jieping Ye VLM ObjD AI4TS 32 2,285 0 13 May 2019
3G structure for image caption generation Aihong Yuan Xuelong Li Xiaoqiang Lu 13 34 0 21 Apr 2019
Pedestrian Attribute Recognition: A Survey Tianlin Li Shaofei Zheng Rui Yang Aihua Zheng Zhe Chen Jin Tang Bin Luo CVBM 28 127 0 22 Jan 2019
Holistic Multi-modal Memory Network for Movie Question Answering Anran Wang Anh Tuan Luu Chuan-Sheng Foo Erik Cambria Yi Tay V. Chandrasekhar 28 20 0 12 Nov 2018
A Comprehensive Survey of Deep Learning for Image Captioning Md Zakir Hossain Ferdous Sohel M. Shiratuddin Hamid Laga VLM 3DV 45 760 0 06 Oct 2018
Graph R-CNN for Scene Graph Generation Jianwei Yang Jiasen Lu Stefan Lee Dhruv Batra Devi Parikh GNN 42 836 0 01 Aug 2018
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering Pan Lu Lei Ji Wei Zhang Nan Duan M. Zhou Jianyong Wang CoGe 25 79 0 24 May 2018
Defoiling Foiled Image Captions Pranava Madhyastha Josiah Wang Lucia Specia 24 9 0 16 May 2018
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding Jiahong Wu He Zheng Bo Zhao Yixin Li Baoming Yan ... Shipei Zhou G. Lin Yanwei Fu Yizhou Wang Yonggang Wang VLM 38 149 0 17 Nov 2017
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering Youngjae Yu Hyungjin Ko Jongwook Choi Gunhee Kim 14 230 0 10 Oct 2016