ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Privacy Preserving Visual Question Answering
Privacy Preserving Visual Question Answering
Cristian-Paul Bara
Q. Ping
Abhinav Mathur
Govind Thattai
M. Rohith
Gaurav Sukhatme
106
1
0
15 Feb 2022
Delving Deeper into Cross-lingual Visual Question Answering
Delving Deeper into Cross-lingual Visual Question Answering
Chen Cecilia Liu
Jonas Pfeiffer
Anna Korhonen
Ivan Vulić
Iryna Gurevych
105
8
0
15 Feb 2022
ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer
ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer
Kohei Uehara
Yusuke Mori
Yusuke Mukuta
Tatsuya Harada
93
6
0
15 Feb 2022
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with
  Omni Retrieval
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Licheng Yu
Jun Chen
Animesh Sinha
Mengjiao MJ Wang
Hugo Chen
Tamara L. Berg
Ning Zhang
VLM
93
39
0
15 Feb 2022
An experimental study of the vision-bottleneck in VQA
An experimental study of the vision-bottleneck in VQA
Pierre Marza
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
93
1
0
14 Feb 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
73
167
0
11 Feb 2022
Bench-Marking And Improving Arabic Automatic Image Captioning Through
  The Use Of Multi-Task Learning Paradigm
Bench-Marking And Improving Arabic Automatic Image Captioning Through The Use Of Multi-Task Learning Paradigm
Muhy Eddin Za'ter
Bashar Talafha
VLM
50
2
0
11 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLMViT
77
19
0
11 Feb 2022
Characterizing and overcoming the greedy nature of learning in
  multi-modal deep neural networks
Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks
Nan Wu
Stanislaw Jastrzebski
Kyunghyun Cho
Krzysztof J. Geras
77
76
0
10 Feb 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive
  Reasoning
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Jack Hessel
Jena D. Hwang
Jinho Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
219
51
0
10 Feb 2022
Lightweight Jet Reconstruction and Identification as an Object Detection
  Task
Lightweight Jet Reconstruction and Identification as an Object Detection Task
Adrian Alan Pol
T. Aarrestad
E. Govorkova
Roi Halily
Anat Klempner
...
Vladimir Loncar
J. Ngadiuba
M. Pierini
Olya Sirkin
S. Summers
53
2
0
09 Feb 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of
  Text-to-Image Generation Models
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
ViT
243
193
0
08 Feb 2022
Self-Supervised Representation Learning for Speech Using Visual
  Grounding and Masked Language Modeling
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng
David Harwath
SSL
96
26
0
07 Feb 2022
Webly Supervised Concept Expansion for General Purpose Vision Models
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath
Christopher Clark
Tanmay Gupta
Eric Kolve
Derek Hoiem
Aniruddha Kembhavi
VLM
90
55
0
04 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
136
101
0
31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLMMLLM
75
16
0
30 Jan 2022
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training
  via Multi-Stage Learning
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
Zejun Li
Zhihao Fan
Huaixiao Tou
Jingjing Chen
Zhongyu Wei
Xuanjing Huang
78
18
0
29 Jan 2022
Rethinking Attention-Model Explainability through Faithfulness Violation
  Test
Rethinking Attention-Model Explainability through Faithfulness Violation Test
Yebin Liu
Haoliang Li
Yangyang Guo
Chen Kong
Jing Li
Shiqi Wang
FAtt
183
43
0
28 Jan 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
  Languages
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
Edoardo Ponti
Ivan Vulić
MLLMVLMELM
117
64
0
27 Jan 2022
Constrained Structure Learning for Scene Graph Generation
Constrained Structure Learning for Scene Graph Generation
Daqing Liu
M. Bober
J. Kittler
3DVCMLBDLOCL
110
7
0
27 Jan 2022
MGA-VQA: Multi-Granularity Alignment for Visual Question Answering
MGA-VQA: Multi-Granularity Alignment for Visual Question Answering
Peixi Xiong
Yilin Shen
Hongxia Jin
35
5
0
25 Jan 2022
SA-VQA: Structured Alignment of Visual and Semantic Representations for
  Visual Question Answering
SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
Peixi Xiong
Quanzeng You
Pei Yu
Zicheng Liu
Ying Wu
60
5
0
25 Jan 2022
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal
  Grounding
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun Reddy Akula
OOD
116
3
0
24 Jan 2022
Supervised Visual Attention for Simultaneous Multimodal Machine
  Translation
Supervised Visual Attention for Simultaneous Multimodal Machine Translation
Veneta Haralampieva
Ozan Caglayan
Lucia Specia
LRM
75
4
0
23 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIPVLM
83
40
0
15 Jan 2022
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric
  Outside-Knowledge Visual Question Answering
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering
Feng Gao
Q. Ping
Govind Thattai
Aishwarya N. Reganti
Yingting Wu
Premkumar Natarajan
63
17
0
14 Jan 2022
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular
  Vision-Language Pre-training
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Yehao Li
Jiahao Fan
Yingwei Pan
Ting Yao
Weiyao Lin
Tao Mei
MLLMObjD
81
19
0
11 Jan 2022
Prior Knowledge Enhances Radiology Report Generation
Prior Knowledge Enhances Radiology Report Generation
Song Wang
Liyan Tang
Mingquan Lin
George Shih
Ying Ding
Yifan Peng
MedIm
65
24
0
11 Jan 2022
COIN: Counterfactual Image Generation for VQA Interpretation
COIN: Counterfactual Image Generation for VQA Interpretation
Zeyd Boukhers
Timo Hartmann
Jan Jurjens
49
7
0
10 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
113
215
0
07 Jan 2022
Self-Training Vision Language BERTs with a Unified Conditional Model
Self-Training Vision Language BERTs with a Unified Conditional Model
Xiaofeng Yang
Fengmao Lv
Fayao Liu
Guosheng Lin
SSLVLM
85
14
0
06 Jan 2022
Compact Bidirectional Transformer for Image Captioning
Compact Bidirectional Transformer for Image Captioning
Yuanen Zhou
Zhenzhen Hu
Daqing Liu
Huixia Ben
Meng Wang
VLM
67
16
0
06 Jan 2022
Discrete and continuous representations and processing in deep learning:
  Looking forward
Discrete and continuous representations and processing in deep learning: Looking forward
Ruben Cartuyvels
Graham Spinks
Marie-Francine Moens
OCL
91
20
0
04 Jan 2022
StyleM: Stylized Metrics for Image Captioning Built with Contrastive
  N-grams
StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams
Chengxi Li
Brent Harrison
103
3
0
04 Jan 2022
Interactive Attention AI to translate low light photos to captions for
  night scene understanding in women safety
Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety
A. Rajagopal
V. Nirmala
Arun Muthuraj Vedamanickam
89
0
0
04 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional
  Vision-Language Generation
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
69
59
0
31 Dec 2021
Knowledge Matters: Radiology Report Generation with General and Specific
  Knowledge
Knowledge Matters: Radiology Report Generation with General and Specific Knowledge
Shuxin Yang
Xian Wu
Shen Ge
S.Kevin Zhou
Li Xiao
MedIm
91
119
0
30 Dec 2021
Synchronized Audio-Visual Frames with Fractional Positional Encoding for
  Transformers in Video-to-Text Translation
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
61
2
0
28 Dec 2021
Multi-Image Visual Question Answering
Multi-Image Visual Question Answering
Harsh Raj
Janhavi Dadhania
Akhilesh Bhardwaj
Prabuchandran KJ
40
2
0
27 Dec 2021
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
125
102
0
23 Dec 2021
A Survey of Natural Language Generation
A Survey of Natural Language Generation
Chenhe Dong
Hai-Tao Zheng
Haifan Gong
Mengzhao Chen
Junxin Li
Ying Shen
Min Yang
3DV
85
45
0
22 Dec 2021
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media
  Knowledge Extraction and Grounding
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Revanth Reddy Gangi Reddy
Xilin Rui
Manling Li
Xudong Lin
Haoyang Wen
...
Joey Tianyi Zhou
Avirup Sil
Shih-Fu Chang
Alex Schwing
Heng Ji
74
32
0
20 Dec 2021
General Greedy De-bias Learning
General Greedy De-bias Learning
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Qi Tian
109
9
0
20 Dec 2021
ScanQA: 3D Question Answering for Spatial Scene Understanding
ScanQA: 3D Question Answering for Spatial Scene Understanding
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
M. Kawanabe
104
208
0
20 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLMCLIP
153
583
0
16 Dec 2021
Bottom Up Top Down Detection Transformers for Language Grounding in
  Images and Point Clouds
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds
Ayush Jain
N. Gkanatsios
Ishita Mediratta
Katerina Fragkiadaki
ObjD
128
109
0
16 Dec 2021
Distilled Dual-Encoder Model for Vision-Language Understanding
Distilled Dual-Encoder Model for Vision-Language Understanding
Zekun Wang
Wenhui Wang
Haichao Zhu
Ming Liu
Bing Qin
Furu Wei
VLMFedML
85
33
0
16 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
  Reasoning
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang
Haoxuan You
Liunian Harold Li
Alireza Zareian
Suji Park
Yiqing Liang
Kai-Wei Chang
Shih-Fu Chang
ReLMLRM
69
33
0
16 Dec 2021
Hierarchical Cross-Modality Semantic Correlation Learning Model for
  Multimodal Summarization
Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization
Litian Zhang
Xiaoming Zhang
Junshu Pan
Feiran Huang
66
48
0
16 Dec 2021
Insta-VAX: A Multimodal Benchmark for Anti-Vaccine and Misinformation
  Posts Detection on Social Media
Insta-VAX: A Multimodal Benchmark for Anti-Vaccine and Misinformation Posts Detection on Social Media
Mingyang Zhou
Mahasweta Chakraborti
Sijia Qian
Zhou Yu
Jingwen Zhang
106
1
0
15 Dec 2021
Previous
123...161718...363738
Next