Image Captioning with Semantic Attention

12 March 2016

Papers citing "Image Captioning with Semantic Attention"

50 / 562 papers shown

Title
FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and Design Zhen Huang Yihao Li Dong Pei Jiapeng Zhou Xuliang Ning Jianlin Han Xiaoguang Han Xuejun Chen 40 3 0 13 Nov 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce Baohao Liao Michael Kozielski Sanjika Hewavitharana Jiangbo Yuan Shahram Khadivi Tomer Lancewicki SSL 18 0 0 22 Oct 2023
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation Rashid Khan Bingding Huang Haseeb Hassan Asim Zaman Z. Ye 29 2 0 11 Oct 2023
HOI4ABOT: Human-Object Interaction Anticipation for Human Intention Reading Collaborative roBOTs Esteve Valls Mascaro Daniel Sliwowski Dongheui Lee 27 8 0 28 Sep 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness Valentin Barriere Felipe del Rio Andres Carvallo De Ferari Carlos Aspillaga Eugenio Herrera-Berg Cristian Buc Calderon DiffM 27 0 0 27 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches Deepak Gupta Kush Attal Dina Demner-Fushman LM&MA 27 1 0 21 Sep 2023
R2GenGPT: Radiology Report Generation with Frozen LLMs Zhanyu Wang Lingqiao Liu Lei Wang Luping Zhou MedIm LM&MA VLM 22 64 0 18 Sep 2023
From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications Shreyank N. Gowda Dheeraj Pandey Shashank Narayana Gowda 52 3 0 30 Aug 2023
DLIP: Distilling Language-Image Pre-training Huafeng Kuang Jie Wu Xiawu Zheng Ming Li Xuefeng Xiao Rui Wang Min Zheng Rongrong Ji VLM 44 4 0 24 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning Manuele Barraco Sara Sarto Marcella Cornia Lorenzo Baraldi Rita Cucchiara VLM 55 19 0 23 Aug 2023
IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer Keqi Fan Xiaohao Cai M. Niranjan MedIm ViT 11 3 0 10 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures J. Liang H. Shahrzad Risto Miikkulainen 28 0 0 08 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification Sai Suprabhanu Nallapaneni Subrahmanyam Konakanchi 30 2 0 05 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning Junjie Fei Teng Wang Jinrui Zhang Zhenyu He Chengjie Wang Feng Zheng VLM 28 34 0 31 Jul 2023
Enhancing image captioning with depth information using a Transformer-based framework Aya Mahmoud Ahmed Mohamed Yousef K. Hussain Yousef B. Mahdy ViT 24 4 0 24 Jul 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes Guoyun Tu Ying Liu Vladimir Vlassov 139 1 0 14 Jul 2023
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language Mihai Masala Nicolae Cudlenco Traian Rebedea Marius Leordeanu 14 0 0 22 May 2023
Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment Shengqiong Wu Hao Fei Wei Ji Tat-Seng Chua 24 28 0 20 May 2023
Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users Wataru Kawabe Yusuke Sugano VLM 35 2 0 11 May 2023
Transforming Visual Scene Graphs to Image Captions Xu Yang Jiawei Peng Zihua Wang Haiyang Xu Qinghao Ye Chenliang Li Mingshi Yan Feisi Huang Zhangzikang Li Yu Zhang 49 19 0 03 May 2023
A Review of Deep Learning for Video Captioning Moloud Abdar Meenakshi Kollati Swaraja Kuraparthi Farhad Pourpanah Daniel J. McDuff ... Shuicheng Yan Abduallah A. Mohamed Abbas Khosravi Min Zhang Fatih Porikli 3DV 37 21 0 22 Apr 2023
Interactive and Explainable Region-guided Radiology Report Generation Tim Tanida Philip Muller Georgios Kaissis Daniel Rueckert MedIm 37 110 0 17 Apr 2023
ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection Jeeseung Park Jin Woo Park Jongseok Lee ViT 37 44 0 17 Apr 2023
Model-Agnostic Gender Debiased Image Captioning Yusuke Hirota Yuta Nakashima Noa Garcia FaML 35 18 0 07 Apr 2023
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens Zhanyu Wang Lingqiao Liu Lei Wang Luping Zhou MedIm 13 71 0 05 Apr 2023
Multi-modal reward for visual relationships-based image captioning Ali Abedi Hossein Karshenas Peyman Adibi 44 2 0 19 Mar 2023
Generation-Guided Multi-Level Unified Network for Video Grounding Xingyi Cheng Xiangyu Wu Dong Shen Hezheng Lin Fan Yang 21 0 0 14 Mar 2023
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models Sheng Wang Zihao Zhao Xi Ouyang Qian Wang Dinggang Shen LM&MA MedIm 29 140 0 14 Feb 2023
On The Coherence of Quantitative Evaluation of Visual Explanations Benjamin Vandersmissen José Oramas XAI FAtt 36 3 0 14 Feb 2023
Towards Local Visual Modeling for Image Captioning Yiwei Ma Jiayi Ji Xiaoshuai Sun Yiyi Zhou Rongrong Ji ViT 21 71 0 13 Feb 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning Mozhgan Pourkeshavarz Shahabedin Nabavi Mohsen Moghaddam M. Shamsfard 31 4 0 08 Feb 2023
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning Jingqiang Chen 25 3 0 04 Feb 2023
Training Integer-Only Deep Recurrent Neural Networks V. Nia Eyyub Sari Vanessa Courville M. Asgharian MQ 53 2 0 22 Dec 2022
Backdoor Attack Detection in Computer Vision by Applying Matrix Factorization on the Weights of Deep Networks Khondoker Murad Hossain Tim Oates AAML 26 4 0 15 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning Jianjie Luo Yehao Li Yingwei Pan Ting Yao Jianlin Feng Hongyang Chao Tao Mei DiffM 30 62 0 06 Dec 2022
Exploring Discrete Diffusion Models for Image Captioning Zixin Zhu Yixuan Wei Jianfeng Wang Zhe Gan Zheng-Wei Zhang Le Wang G. Hua Lijuan Wang Zicheng Liu Han Hu DiffM VLM 28 17 0 21 Nov 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation Jie Ruan Yue Wu Xiaojun Wan Yuesheng Zhu 29 1 0 20 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning Pengpeng Zeng Jinkuan Zhu Jingkuan Song Lianli Gao VLM 24 27 0 17 Nov 2022
OSIC: A New One-Stage Image Captioner Coined Bo Wang Zhao Zhang Ming Zhao Xiaojie Jin Mingliang Xu Meng Wang VLM 28 3 0 04 Nov 2022
Prophet Attention: Predicting Attention with Future Attention for Image Captioning Fenglin Liu Xuancheng Ren Xian Wu Wei Fan Yuexian Zou Xu Sun 24 46 0 19 Oct 2022
Improving Radiology Summarization with Radiograph and Anatomy Prompts Jinpeng Hu Zhihong Chen Yang Liu Xiang Wan Tsung-Hui Chang MedIm 34 8 0 15 Oct 2022
What Should the System Do Next?: Operative Action Captioning for Estimating System Actions Taiki Nakamura Seiya Kawano Akishige Yuguchi Yasutomo Kawanishi Koichiro Yoshino 14 0 0 06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data Ye Zhu Yuehua Wu N. Sebe Yan Yan 33 16 0 05 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning Xu Yang Hanwang Zhang Chongyang Gao Jianfei Cai MLLM 40 10 0 04 Oct 2022
Medical Image Captioning via Generative Pretrained Transformers Alexander Selivanov Oleg Y. Rogov Daniil Chesakov Artem Shelmanov Irina Fedulova Dmitry V. Dylov MedIm 57 55 0 28 Sep 2022
STING: Self-attention based Time-series Imputation Networks using GAN Eunkyu Oh Taehun Kim Yunhu Ji Sushil Khyalia AI4TS 29 25 0 22 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia K. Nguyen Ali Furkan Biten Andrés Mafla Lluís Gómez Dimosthenis Karatzas 36 10 0 21 Sep 2022
Belief Revision based Caption Re-ranker with Visual Semantic Information Ahmed Sabir Francesc Moreno-Noguer Pranava Madhyastha Lluís Padró BDL 29 2 0 16 Sep 2022
M^4I: Multi-modal Models Membership Inference Pingyi Hu Zihan Wang Ruoxi Sun Hu Wang Minhui Xue 39 26 0 15 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Mustafa Shukor Guillaume Couairon Matthieu Cord VLM CLIP 24 27 0 29 Aug 2022