v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017

Lei Zhang

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown

Title
NarrationBot and InfoBot: A Hybrid System for Automated Video Description Shasta Ihorn Y. Siu Aditya Bodi Lothar D Narins Jose M. Castanon Yash Kant Abhishek Das Ilmi Yoon Pooyan Fazli 44 3 0 07 Nov 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling Renrui Zhang Rongyao Fang Wei Zhang Peng Gao Kunchang Li Jifeng Dai Yu Qiao Hongsheng Li VLM 292 403 0 06 Nov 2021
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval Zhihao Fan Zhongyu Wei Zejun Li Siyuan Wang Jianqing Fan 53 7 0 05 Nov 2021
LILA: Language-Informed Latent Actions Siddharth Karamcheti Megha Srivastava Percy Liang Dorsa Sadigh LM&Ro 96 32 0 05 Nov 2021
Introspective Distillation for Robust Question Answering Yulei Niu Hanwang Zhang 94 60 0 01 Nov 2021
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval Ning Han Jingjing Chen Chuhao Shi Yawen Zeng Guangyi Xiao Hao Chen 105 11 0 29 Oct 2021
Towards artificial general intelligence via a multimodal foundation model Nanyi Fei Zhiwu Lu Yizhao Gao Guoxing Yang Yuqi Huo ... Ruihua Song Xin Gao Tao Xiang Haoran Sun Jiling Wen AI4CE LRM 90 229 0 27 Oct 2021
Perceptual Score: What Data Modalities Does Your Model Perceive? Itai Gat Idan Schwartz Alex Schwing 96 31 0 27 Oct 2021
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation A. Moudgil Arjun Majumdar Harsh Agrawal Stefan Lee Dhruv Batra LM&Ro 84 61 0 27 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning Pan Lu Liang Qiu Jiaqi Chen Tony Xia Yizhou Zhao Wei Zhang Zhou Yu Xiaodan Liang Song-Chun Zhu AIMat 156 206 0 25 Oct 2021
Instance-Conditional Knowledge Distillation for Object Detection Zijian Kang Peizhen Zhang Xinming Zhang Jian Sun N. Zheng 98 79 0 25 Oct 2021
Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network Yuansan Liu MD Abdullah Al Nasim Sourav Saha Faria Afrin Raisa Mallik Sathishkumar Samiappan ViT 41 14 0 24 Oct 2021
Single-Modal Entropy based Active Learning for Visual Question Answering Dong-Jin Kim Jae-Won Cho Jinsoo Choi Yunjae Jung In So Kweon 63 12 0 21 Oct 2021
A Self-Explainable Stylish Image Captioning Framework via Multi-References Chengxi Li Brent Harrison 124 0 0 20 Oct 2021
A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation Yupan Huang Bei Liu Jianlong Fu Yutong Lu DiffM 60 6 0 19 Oct 2021
Unifying Multimodal Transformer for Bi-directional Image and Text Generation Yupan Huang Hongwei Xue Bei Liu Yutong Lu 79 59 0 19 Oct 2021
Attention W-Net: Improved Skip Connections for better Representations Shikhar Mohan Saumik Bhattacharya Sayantari Ghosh SSeg 113 4 0 17 Oct 2021
Towards Language-guided Visual Recognition via Dynamic Convolutions Gen Luo Yiyi Zhou Xiaoshuai Sun Yongjian Wu Yue Gao Rongrong Ji ObjD 98 19 0 17 Oct 2021
Multimodal Dialogue Response Generation Qingfeng Sun Yujing Wang Can Xu Kai Zheng Yaming Yang Huang Hu Fei Xu Jessica Zhang Xiubo Geng Daxin Jiang 104 49 0 16 Oct 2021
Self-Annotated Training for Controllable Image Captioning Zhangzi Zhu Tianlei Wang Hong Qu 70 2 0 16 Oct 2021
Guiding Visual Question Generation Nihir Vedd Zixu Wang Marek Rei Yishu Miao Lucia Specia 138 22 0 15 Oct 2021
Improving Users' Mental Model with Attention-directed Counterfactual Edits Kamran Alipour Arijit Ray Xiaoyu Lin Michael Cogswell J. Schulze Yi Yao Giedrius Burachas OOD 61 9 0 13 Oct 2021
Understanding of Emotion Perception from Art Digbalay Bose Krishna Somandepalli Souvik Kundu Rimita Lahiri Jonathan Gratch Shrikanth Narayanan 29 5 0 13 Oct 2021
Topic Scene Graph Generation by Attention Distillation from Caption Wenbin Wang R. Wang X. Chen DiffM 94 14 0 12 Oct 2021
Semi-Autoregressive Image Captioning Xu Yan Zhengcong Fei Zekang Li Shuhui Wang Qingming Huang Qi Tian 91 25 0 11 Oct 2021
Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking Dirk Vath Pascal Tilli Ngoc Thang Vu 77 4 0 11 Oct 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters Peng Gao Shijie Geng Renrui Zhang Teli Ma Rongyao Fang Yongfeng Zhang Hongsheng Li Yu Qiao VLM CLIP 350 1,062 0 09 Oct 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models J. Tan C. Chan Joon Huang Chuah VLM 132 16 0 07 Oct 2021
Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Ali Furkan Biten Andrés Mafla Lluís Gómez Dimosthenis Karatzas 248 18 0 06 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering Long Chen Yuhang Zheng Yulei Niu Hanwang Zhang Jun Xiao AAML OOD 119 37 0 03 Oct 2021
ProTo: Program-Guided Transformer for Program-Guided Tasks Zelin Zhao Karan Samel Binghong Chen Le Song ViT LM&Ro 91 30 0 02 Oct 2021
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images Zhuowan Li Elias Stengel-Eskin Yixiao Zhang Cihang Xie Q. Tran Benjamin Van Durme Alan Yuille VLM 73 15 0 01 Oct 2021
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning Chi-Yin Wang Yulin Shen Luping Ji ViT 106 53 0 01 Oct 2021
Geometry-Entangled Visual Semantic Transformer for Image Captioning Ling Cheng Wei Wei Feida Zhu Yong Liu Chunyan Miao ViT 45 3 0 29 Sep 2021
Visually Grounded Concept Composition Bowen Zhang Hexiang Hu Linlu Qiu Peter Shaw Fei Sha CoGe 122 6 0 29 Sep 2021
Visually Grounded Reasoning across Languages and Cultures Fangyu Liu Emanuele Bugliarello Edoardo Ponti Siva Reddy Nigel Collier Desmond Elliott VLM LRM 171 180 0 28 Sep 2021
CIDEr-R: Robust Consensus-based Image Description Evaluation G. O. D. Santos Esther Luna Colombini Sandra Avila 81 30 0 28 Sep 2021
The Tensor Brain: A Unified Theory of Perception, Memory and Semantic Decoding Volker Tresp Sahand Sharifzadeh Hang Li Dario Konopatzki Yunpu Ma 73 6 0 27 Sep 2021
Multimodal Integration of Human-Like Attention in Visual Question Answering Ekta Sood Fabian Kögel Philippe Muller Dominike Thomas Mihai Bâce Andreas Bulling 66 17 0 27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering Ekta Sood Fabian Kögel Florian Strohm Prajit Dhar Andreas Bulling 67 19 0 27 Sep 2021
The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service Nan Zhao Haoran Li Youzheng Wu Xiaodong He Bowen Zhou 46 9 0 27 Sep 2021
Why Do We Click: Visual Impression-aware News Recommendation Jiahao Xun Shengyu Zhang Zhou Zhao Jieming Zhu Qi Zhang Jingjie Li Xiuqiang He Xiaofei He Tat-Seng Chua Leilei Gan 152 33 0 26 Sep 2021
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation An Yan Zexue He Xing Lu Jingfeng Du E. Chang Amilcare Gentili Julian McAuley Chun-Nan Hsu MedIm 176 65 0 25 Sep 2021
An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog Xingyao Wang David Jurgens 66 5 0 24 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models Yuan Yao Ao Zhang Zhengyan Zhang Zhiyuan Liu Tat-Seng Chua Maosong Sun MLLM VPVLM VLM 300 224 0 24 Sep 2021
Dense Contrastive Visual-Linguistic Pretraining Lei Shi Kai Shuang Shijie Geng Peng Gao Zuohui Fu Gerard de Melo Yunpeng Chen Sen Su VLM SSL 127 11 0 24 Sep 2021
COVR: A test-bed for Visually Grounded Compositional Generalization with real images Ben Bogin Shivanshu Gupta Matt Gardner Jonathan Berant CoGe 105 29 0 22 Sep 2021
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation Yongfei Liu Chenfei Wu Shao-Yen Tseng Vasudev Lal Xuming He Nan Duan CLIP VLM 110 29 0 22 Sep 2021
Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation Feilong Chen Fandong Meng Xiuyi Chen Peng Li Jie Zhou 99 23 0 17 Sep 2021
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog Feilong Chen Xiuyi Chen Fandong Meng Peng Li Jie Zhou 145 35 0 17 Sep 2021