Progressive Tree-Structured Prototype Network for End-to-End Image Captioning

17 November 2022

Pengpeng Zeng

Jinkuan Zhu

Jingkuan Song

Lianli Gao

VLM

ArXiv PDF HTML

Papers citing "Progressive Tree-Structured Prototype Network for End-to-End Image Captioning"

33 / 33 papers shown

Title
End-to-End Transformer Based Model for Image Captioning Yiyu Wang Jungang Xu Yingfei Sun VLM ViT 49 120 0 29 Mar 2022
Injecting Semantic Concepts into End-to-End Image Captioning Zhiyuan Fang Jianfeng Wang Xiaowei Hu Lin Liang Zhe Gan Lijuan Wang Yezhou Yang Zicheng Liu ViT VLM 62 86 0 09 Dec 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision Zirui Wang Jiahui Yu Adams Wei Yu Zihang Dai Yulia Tsvetkov Yuan Cao VLM MLLM 112 792 0 24 Aug 2021
Group-based Distinctive Image Captioning with Memory Attention Jiuniu Wang Wenjia Xu Qingzhong Wang Antoni B. Chan 68 18 0 20 Aug 2021
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning Xinzhi Dong Chengjiang Long Wenju Xu Chunxia Xiao ViT 101 66 0 05 Aug 2021
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding Pengchuan Zhang Xiyang Dai Jianwei Yang Bin Xiao Lu Yuan Lei Zhang Jianfeng Gao ViT 73 335 0 29 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Ze Liu Yutong Lin Yue Cao Han Hu Yixuan Wei Zheng Zhang Stephen Lin B. Guo ViT 398 21,281 0 25 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 808 29,167 0 26 Feb 2021
Dual-Level Collaborative Transformer for Image Captioning Yunpeng Luo Jiayi Ji Xiaoshuai Sun Liujuan Cao Yongjian Wu Feiyue Huang Chia-Wen Lin Rongrong Ji ViT 50 276 0 16 Jan 2021
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network Jiayi Ji Yunpeng Luo Xiaoshuai Sun Fuhai Chen Gen Luo Yongjian Wu Yue Gao Rongrong Ji ViT 68 172 0 13 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 526 40,739 0 22 Oct 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Xiujun Li Xi Yin Chunyuan Li Pengchuan Zhang Xiaowei Hu ... Houdong Hu Li Dong Furu Wei Yejin Choi Jianfeng Gao VLM 88 1,934 0 13 Apr 2020
X-Linear Attention Networks for Image Captioning Yingwei Pan Ting Yao Yehao Li Tao Mei 92 510 0 31 Mar 2020
Meshed-Memory Transformer for Image Captioning Marcella Cornia Matteo Stefanini Lorenzo Baraldi Rita Cucchiara 59 874 0 17 Dec 2019
Attention on Attention for Image Captioning Lun Huang Wenmin Wang Jie Chen Xiao-Yong Wei 56 829 0 19 Aug 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.4K 94,511 0 11 Oct 2018
Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering Unnat Jain Svetlana Lazebnik Alex Schwing MLLM 60 81 0 29 Mar 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould Lei Zhang AIMat 111 4,208 0 25 Jul 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 624 130,942 0 12 Jun 2017
Self-critical Sequence Training for Image Captioning Steven J. Rennie E. Marcheret Youssef Mroueh Jerret Ross Vaibhava Goel 105 1,883 0 02 Dec 2016
Visual Dialog Abhishek Das Satwik Kottur Khushi Gupta Avi Singh Deshraj Yadav José M. F. Moura Devi Parikh Dhruv Batra 142 996 0 26 Nov 2016
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 89 853 0 21 Sep 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 84 1,909 0 29 Jul 2016
Image Captioning with Deep Bidirectional LSTMs Cheng Wang Haojin Yang Christian Bartz Christoph Meinel VLM 42 279 0 04 Apr 2016
Image Captioning with Semantic Attention Quanzeng You Hailin Jin Zhaowen Wang Chen Fang Jiebo Luo VLM 164 1,659 0 12 Mar 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Ranjay Krishna Yuke Zhu Oliver Groth Justin Johnson Kenji Hata ... Yannis Kalantidis Li Li David A. Shamma Michael S. Bernstein Fei-Fei Li 194 5,726 0 23 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.9K 193,426 0 10 Dec 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross B. Girshick Jian Sun AIMat ObjD 461 62,122 0 04 Jun 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 310 10,050 0 10 Feb 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions A. Karpathy Li Fei-Fei 89 5,578 0 07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 250 4,471 0 20 Nov 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.3K 100,213 0 04 Sep 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 367 43,524 0 01 May 2014