Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

21 September 2016

Papers citing "Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge"

50 / 92 papers shown

Title
ChatBEV: A Visual Language Model that Understands BEV Maps Qingyao Xu Tian Jin Guang Chen Yanfeng Wang Yujie Zhang 51 0 0 18 Mar 2025
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation S. Joshi Besmira Nushi Vidhisha Balachandran Varun Chandrasekaran Vibhav Vineet Neel Joshi Baharan Mirzasoleiman MLLM VLM 49 0 0 07 Jan 2025
Progress-Aware Video Frame Captioning Zihui Xue Joungbin An Xitong Yang Kristen Grauman 100 1 0 03 Dec 2024
Diagnosing Human-object Interaction Detectors Fangrui Zhu Yiming Xie Weidi Xie Huaizu Jiang 30 7 0 16 Aug 2023
Evaluating Pragmatic Abilities of Image Captioners on A3DS Polina Tsvilodub Michael Franke EGVM 25 3 0 22 May 2023
Similarity-Aware Multimodal Prompt Learning for Fake News Detection Ye Jiang Xiaomin Yu Yimin Wang Xiaoman Xu Xingyi Song Diana Maynard 29 20 0 09 Apr 2023
Retrieval-augmented Image Captioning R. Ramos Desmond Elliott Bruno Martins VLM 32 29 0 16 Feb 2023
On The Coherence of Quantitative Evaluation of Visual Explanations Benjamin Vandersmissen José Oramas XAI FAtt 36 3 0 14 Feb 2023
Towards Local Visual Modeling for Image Captioning Yiwei Ma Jiayi Ji Xiaoshuai Sun Yiyi Zhou Rongrong Ji ViT 21 71 0 13 Feb 2023
An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU) Rana Adnan Ahmad Muhammad Azhar Hina Sattar 26 10 0 06 Jan 2023
SLAM for Visually Impaired People: a Survey Banafshe Marziyeh Bamdad Davide Scaramuzza Alireza Darvishy 10 8 0 09 Dec 2022
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning Pengpeng Zeng Jinkuan Zhu Jingkuan Song Lianli Gao VLM 24 27 0 17 Nov 2022
FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure Y. F. Arthanto David Ojika Joo-Young Kim FedML 58 2 0 11 Jul 2022
Image Captioning based on Feature Refinement and Reflective Decoding G. Alabduljabbar Hafida Benhidour Said Kerrache 3DV 22 3 0 16 Jun 2022
Prompt-based Learning for Unpaired Image Captioning Peipei Zhu Tianlin Li Lin Zhu Zhenglong Sun Weishi Zheng Yaowei Wang Chia-Ju Chen VLM 27 31 0 26 May 2022
Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval Zhiqiang Yuan Wenkai Zhang Kun Fu Xuan Li Chubo Deng Hongqi Wang Xian Sun 29 129 0 21 Apr 2022
Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness Paul Pu Liang 30 4 0 14 Apr 2022
The Unsurprising Effectiveness of Pre-Trained Vision Models for Control Simone Parisi Aravind Rajeswaran Senthil Purushwalkam Abhinav Gupta LM&Ro 34 187 0 07 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition Peipei Zhu Tianlin Li Yong Luo Zhenglong Sun Wei-Shi Zheng Yaowei Wang Chia-Ju Chen 30 12 0 07 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning Manuele Barraco Matteo Stefanini Marcella Cornia S. Cascianelli Lorenzo Baraldi Rita Cucchiara ViT VLM 38 27 0 21 Feb 2022
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition Xuankai Chang Takashi Maekaku Pengcheng Guo Jing Shi Yen-Ju Lu ... Tianzi Wang Shu-Wen Yang Yu Tsao Hung-yi Lee Shinji Watanabe SSL AI4TS 24 81 0 09 Oct 2021
Introducing the DOME Activation Functions Mohamed E. Hussein Wael AbdAlmageed 30 1 0 30 Sep 2021
Caption Enriched Samples for Improving Hateful Memes Detection Efrat Blaier Itzik Malkiel Lior Wolf VLM 56 21 0 22 Sep 2021
Cross Modification Attention Based Deliberation Model for Image Captioning Zheng Lian Yanan Zhang Haichang Li Rui Wang Xiaohui Hu 24 4 0 17 Sep 2021
Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic Wenjia Zhang Lin Gui Yulan He 33 32 0 04 Sep 2021
Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene Qi Wu Cheng-Ju Wu Yixin Zhu Jungseock Joo 43 14 0 05 Aug 2021
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning Xinzhi Dong Chengjiang Long Wenju Xu Chunxia Xiao ViT 79 66 0 05 Aug 2021
Hybrid Reasoning Network for Video-based Commonsense Captioning Weijiang Yu Jian Liang Lei Ji Lu Li Yuejian Fang Nong Xiao Nan Duan 19 10 0 05 Aug 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning Paul Pu Liang Yiwei Lyu Xiang Fan Zetian Wu Yun Cheng ... Peter Wu Michelle A. Lee Yuke Zhu Ruslan Salakhutdinov Louis-Philippe Morency VLM 32 159 0 15 Jul 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning Jack Hessel Ari Holtzman Maxwell Forbes Ronan Le Bras Yejin Choi CLIP 17 1,442 0 18 Apr 2021
Visual Goal-Step Inference using wikiHow Yue Yang Artemis Panagopoulou Qing Lyu Li Zhang Mark Yatskar Chris Callison-Burch 37 41 0 12 Apr 2021
Comparative evaluation of CNN architectures for Image Caption Generation Sulabh Katiyar S. Borgohain 19 24 0 23 Feb 2021
Diagnostic Captioning: A Survey John Pavlopoulos Vasiliki Kougia Ion Androutsopoulos D. Papamichail 3DV MedIm 91 26 0 18 Jan 2021
Towards Overcoming False Positives in Visual Relationship Detection Daisheng Jin Xiao Ma Chongzhi Zhang Yizhuo Zhou Jiashu Tao ... Haiyu Zhao Shuai Yi Zhoujun Li Xianglong Liu Hongsheng Li 25 5 0 23 Dec 2020
AutoCaption: Image Captioning with Neural Architecture Search Xinxin Zhu Weining Wang Longteng Guo Jing Liu 29 9 0 16 Dec 2020
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network Jiayi Ji Yunpeng Luo Xiaoshuai Sun Fuhai Chen Gen Luo Yongjian Wu Yue Gao Rongrong Ji ViT 51 170 0 13 Dec 2020
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale Ozan Caglayan Pranava Madhyastha Lucia Specia ELM 39 35 0 26 Oct 2020
TextMage: The Automated Bangla Caption Generator Based On Deep Learning Abrar Hasin Kamal Md Asifuzzaman Jishan N. Mansoor VLM 8 17 0 15 Oct 2020
Towards Unique and Informative Captioning of Images Zeyu Wang Berthy Feng Karthik R. Narasimhan Olga Russakovsky 25 37 0 08 Sep 2020
Neural Learning of One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces Yatin Nandwani Deepanshu Jindal Mausam Parag Singla 16 13 0 27 Aug 2020
Explore and Explain: Self-supervised Navigation and Recounting Roberto Bigazzi Federico Landi Marcella Cornia S. Cascianelli Lorenzo Baraldi Rita Cucchiara EgoV LM&Ro 19 17 0 14 Jul 2020
Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning Longteng Guo Jing Liu Xinxin Zhu Xingjian He Jie Jiang Hanqing Lu BDL 24 56 0 10 May 2020
AIBench Scenario: Scenario-distilling AI Benchmarking Wanling Gao Fei Tang Jianfeng Zhan Xu Wen Lei Wang Zheng Cao Chuanxin Lan Chunjie Luo Xiaoli Liu Zihan Jiang 29 14 0 06 May 2020
AIBench Training: Balanced Industry-Standard AI Training Benchmarking Fei Tang Wanling Gao Jianfeng Zhan Chuanxin Lan Xu Wen ... Yatao Li Junchao Shao Zhenyu Wang Xiaoyu Wang Hainan Ye 30 3 0 30 Apr 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning Longteng Guo Jing Liu Xinxin Zhu Peng Yao Shichen Lu Hanqing Lu ViT 135 189 0 19 Mar 2020
SAFE: Similarity-Aware Multi-Modal Fake News Detection Xinyi Zhou Jindi Wu R. Zafarani 38 65 0 19 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC) C. Sur 25 16 0 15 Feb 2020
Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network W. Ramos M. Silva Edson Roteia Araujo Junior Alan C. Neves Erickson R. Nascimento 22 6 0 29 Dec 2019
Meshed-Memory Transformer for Image Captioning Marcella Cornia Matteo Stefanini Lorenzo Baraldi Rita Cucchiara 14 868 0 17 Dec 2019
Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication Ruize Wang Zhongyu Wei Ying Cheng Piji Li Haijun Shan Ji Zhang Qi Zhang Xuanjing Huang VGen DiffM 20 13 0 11 Nov 2019