ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
  Skip-connections
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLMMLLM
93
224
0
24 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for
  Vision-language Models
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLMMLLM
93
38
0
23 May 2022
Supporting Vision-Language Model Inference with Confounder-pruning
  Knowledge Prompt
Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt
Jiangmeng Li
Wenyi Mo
Jingyao Wang
Fuchun Sun
Changwen Zheng
Hui Xiong
Ji-Rong Wen
VLM
86
0
0
23 May 2022
Visual Concepts Tokenization
Visual Concepts Tokenization
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
OCLViT
107
15
0
20 May 2022
Voxel-informed Language Grounding
Voxel-informed Language Grounding
Rodolfo Corona
Shizhan Zhu
Dan Klein
Trevor Darrell
178
12
0
19 May 2022
Gender and Racial Bias in Visual Question Answering Datasets
Gender and Racial Bias in Visual Question Answering Datasets
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
187
55
0
17 May 2022
Importance Weighted Structure Learning for Scene Graph Generation
Importance Weighted Structure Learning for Scene Graph Generation
Daqing Liu
M. Bober
J. Kittler
111
5
0
14 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its
  Effect on Visual Description Models and Metrics
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
71
6
0
12 May 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
89
35
0
10 May 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual
  Context for Image Captioning
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Chia-Wen Kuo
Z. Kira
97
55
0
09 May 2022
RoViST:Learning Robust Metrics for Visual Storytelling
RoViST:Learning Robust Metrics for Visual Storytelling
Eileen Wang
S. Han
Josiah Poon
49
10
0
08 May 2022
From Easy to Hard: Learning Language-guided Curriculum for Visual
  Question Answering on Remote Sensing Data
From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data
Zhenghang Yuan
Lichao Mou
Q. Wang
Xiao Xiang Zhu
105
67
0
06 May 2022
QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary
  Visual Reasoning
QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning
Zechen Li
Anders Søgaard
42
6
0
06 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLMMLLM
107
98
0
05 May 2022
All You May Need for VQA are Image Captions
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
101
76
0
04 May 2022
Diverse Image Captioning with Grounded Style
Diverse Image Captioning with Grounded Style
Franz Klein
Shweta Mahajan
S. Roth
72
7
0
03 May 2022
Cross-modal Representation Learning for Zero-shot Action Recognition
Cross-modal Representation Learning for Zero-shot Action Recognition
Chung-Ching Lin
Kevin Qinghong Lin
Linjie Li
Lijuan Wang
Zicheng Liu
ViT
62
29
0
03 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
77
16
0
02 May 2022
GRIT: General Robust Image Task Benchmark
GRIT: General Robust Image Task Benchmark
Tanmay Gupta
Ryan Marten
Aniruddha Kembhavi
Derek Hoiem
VLMOODObjD
75
33
0
28 Apr 2022
Reliable Visual Question Answering: Abstain Rather Than Answer
  Incorrectly
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
Spencer Whitehead
Suzanne Petryk
Vedaad Shakib
Joseph E. Gonzalez
Trevor Darrell
Anna Rohrbach
Marcus Rohrbach
103
56
0
28 Apr 2022
Controllable Image Captioning
Luka Maxwell
99
0
0
28 Apr 2022
Cross-modal Memory Networks for Radiology Report Generation
Cross-modal Memory Networks for Radiology Report Generation
Zhihong Chen
Yaling Shen
Yan Song
Xiang Wan
MedIm
115
262
0
28 Apr 2022
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Alex Falcon
Swathikiran Sudhakaran
G. Serra
Sergio Escalera
Oswald Lanz
91
9
0
27 Apr 2022
CapOnImage: Context-driven Dense-Captioning on Image
CapOnImage: Context-driven Dense-Captioning on Image
Yiqi Gao
Xinglin Hou
Yuanmeng Zhang
T. Ge
Yuning Jiang
Peifeng Wang
139
10
0
27 Apr 2022
Progressive Learning for Image Retrieval with Hybrid-Modality Queries
Progressive Learning for Image Retrieval with Hybrid-Modality Queries
Yida Zhao
Yuqing Song
Qin Jin
80
29
0
24 Apr 2022
RelViT: Concept-guided Vision Transformer for Visual Relational
  Reasoning
RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
Xiaojian Ma
Weili Nie
Zhiding Yu
Huaizu Jiang
Chaowei Xiao
Yuke Zhu
Song-Chun Zhu
Anima Anandkumar
ViTLRM
131
19
0
24 Apr 2022
Training and challenging models for text-guided fashion image retrieval
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
63
9
0
23 Apr 2022
Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote
  Sensing Image Retrieval
Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval
Zhiqiang Yuan
Wenkai Zhang
Kun Fu
Xuan Li
Chubo Deng
Hongqi Wang
Xian Sun
99
139
0
21 Apr 2022
Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
  Local Information
Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information
Zhiqiang Yuan
Wenkai Zhang
Changyuan Tian
Xuee Rong
Zhengyuan Zhang
Hongqi Wang
Kun Fu
Xian Sun
94
130
0
21 Apr 2022
Attention in Reasoning: Dataset, Analysis, and Modeling
Attention in Reasoning: Dataset, Analysis, and Modeling
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
48
3
0
20 Apr 2022
Uncertainty-based Cross-Modal Retrieval with Probabilistic
  Representations
Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations
Leila Pishdad
Ran Zhang
Konstantinos G. Derpanis
Allan D. Jepson
Afsaneh Fazly
41
2
0
20 Apr 2022
Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment
  Analysis
Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis
Yan Ling
Jianfei Yu
Rui Xia
66
76
0
17 Apr 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
  One-Stage Referring Expression Comprehension
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
Gen Luo
Yiyi Zhou
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
78
10
0
17 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
80
0
0
17 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
64
47
0
16 Apr 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Visual Attention Methods in Deep Learning: An In-Depth Survey
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
Fahad Shahbaz Khan
Ajmal Mian
136
166
0
16 Apr 2022
Guiding Attention using Partial-Order Relationships for Image Captioning
Guiding Attention using Partial-Order Relationships for Image Captioning
Murad Popattia
Muhammad Rafi
Rizwan Qureshi
Shah Nawaz
52
5
0
15 Apr 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
  Cross-Modal Retrieval
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIPVLM
96
55
0
15 Apr 2022
Image Captioning In the Transformer Age
Image Captioning In the Transformer Age
Yangliu Xu
Li Li
Haiyang Xu
Songfang Huang
Fei Huang
Jianfei Cai
ViT
59
6
0
15 Apr 2022
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Zhaowei Cai
Gukyeong Kwon
Avinash Ravichandran
Erhan Bas
Zhuowen Tu
Rahul Bhotika
Stefano Soatto
ObjDMLLMVLM
67
50
0
12 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
97
9
0
11 Apr 2022
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Shunyu Zhang
X. Jiang
Zequn Yang
T. Wan
Zengchang Qin
60
12
0
10 Apr 2022
On Distinctive Image Captioning via Comparing and Reweighting
On Distinctive Image Captioning via Comparing and Reweighting
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
87
16
0
08 Apr 2022
ECCV Caption: Correcting False Negatives by Collecting
  Machine-and-Human-verified Image-Caption Associations for MS-COCO
ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO
Sanghyuk Chun
Wonjae Kim
Song Park
Minsuk Chang
Seong Joon Oh
VLM
549
46
0
07 Apr 2022
OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
Robik Shrestha
Kushal Kafle
Christopher Kanan
CML
94
13
0
05 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRMNAI
102
20
0
05 Apr 2022
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context
  in Visual Question Answering
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
Vipul Gupta
Zhuowan Li
Adam Kortylewski
Chenyu Zhang
Yingwei Li
Alan Yuille
79
46
0
05 Apr 2022
Attribute Prototype Network for Any-Shot Learning
Attribute Prototype Network for Any-Shot Learning
Wenjia Xu
Yongqin Xian
Jiuniu Wang
Bernt Schiele
Zeynep Akata
VLM
82
39
0
04 Apr 2022
Question-Driven Graph Fusion Network For Visual Question Answering
Question-Driven Graph Fusion Network For Visual Question Answering
Yuxi Qian
Yuncong Hu
Ruonan Wang
Fangxiang Feng
Xiaojie Wang
GNN
136
10
0
03 Apr 2022
Co-VQA : Answering by Interactive Sub Question Sequence
Co-VQA : Answering by Interactive Sub Question Sequence
Ruonan Wang
Yuxi Qian
Fangxiang Feng
Xiaojie Wang
Huixing Jiang
LRM
75
17
0
02 Apr 2022
Previous
123...141516...363738
Next