ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
MTG: A Benchmark Suite for Multilingual Text Generation
MTG: A Benchmark Suite for Multilingual Text Generation
Yiran Chen
Zhenqiao Song
Xianze Wu
Danqing Wang
Jingjing Xu
Jiaze Chen
Hao Zhou
Lei Li
LRMVLM
82
22
0
13 Aug 2021
Towers of Babel: Combining Images, Language, and 3D Geometry for
  Learning Multimodal Vision
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Xiaoshi Wu
Hadar Averbuch-Elor
J. Sun
Noah Snavely
84
19
0
12 Aug 2021
A Better Loss for Visual-Textual Grounding
A Better Loss for Visual-Textual Grounding
Davide Rigoni
Luciano Serafini
A. Sperduti
ObjD
60
3
0
11 Aug 2021
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report
  Generation With Alternate Learning
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning
Guangyi Liu
Yinghong Liao
Fuyu Wang
Bin Zhang
Lu Zhang
...
Xiang Wan
Shaolin Li
Zhen Li
Shuixing Zhang
Shuguang Cui
114
59
0
11 Aug 2021
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease
  Diagnosis
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor
Mozhdeh Rouhsedaghat
Liunian Harold Li
Aichi Chien
C.-C. Jay Kuo
Fabien Scalzo
Kai-Wei Chang
LM&MAMedIm
60
31
0
10 Aug 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
  Models
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
79
207
0
09 Aug 2021
Discriminative Latent Semantic Graph for Video Captioning
Discriminative Latent Semantic Graph for Video Captioning
Yang Bai
Junyan Wang
Yang Long
Bingzhang Hu
Yang Song
Maurice Pagnucco
Yu Guan
86
31
0
08 Aug 2021
Interpretable Visual Understanding with Cognitive Attention Network
Interpretable Visual Understanding with Cognitive Attention Network
Xuejiao Tang
Wenbin Zhang
Yi Yu
Kea Turner
Hanyu Wang
Mengyu Wang
Eirini Ntoutsi
136
12
0
06 Aug 2021
Neural Twins Talk & Alternative Calculations
Neural Twins Talk & Alternative Calculations
Zanyar Zohourianshahzadi
Jugal Kalita
52
0
0
05 Aug 2021
Structured Multi-modal Feature Embedding and Alignment for
  Image-Sentence Retrieval
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval
Xuri Ge
Fuhai Chen
J. Jose
Zhilong Ji
Zhongqin Wu
Xiao-Chang Liu
72
57
0
05 Aug 2021
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D
  Visual Grounding
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding
Dailan He
Yusheng Zhao
Junyu Luo
Tianrui Hui
Shaofei Huang
Aixi Zhang
Si Liu
ViT
67
95
0
05 Aug 2021
Dual Graph Convolutional Networks with Transformer and Curriculum
  Learning for Image Captioning
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
Xinzhi Dong
Chengjiang Long
Wenju Xu
Chunxia Xiao
ViT
147
68
0
05 Aug 2021
Ordered Attention for Coherent Visual Storytelling
Ordered Attention for Coherent Visual Storytelling
Tom Braude
Idan Schwartz
Alex Schwing
Ariel Shamir
61
9
0
04 Aug 2021
Question-controlled Text-aware Image Captioning
Question-controlled Text-aware Image Captioning
Anwen Hu
Shizhe Chen
Qin Jin
76
15
0
04 Aug 2021
ICECAP: Information Concentrated Entity-aware Image Captioning
ICECAP: Information Concentrated Entity-aware Image Captioning
Anwen Hu
Shizhe Chen
Qin Jin
61
20
0
04 Aug 2021
Sparse Continuous Distributions and Fenchel-Young Losses
Sparse Continuous Distributions and Fenchel-Young Losses
André F. T. Martins
Marcos Vinícius Treviso
António Farinhas
P. Aguiar
Mário A. T. Figueiredo
Mathieu Blondel
Vlad Niculae
76
12
0
04 Aug 2021
RAIN: Reinforced Hybrid Attention Inference Network for Motion
  Forecasting
RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting
Jiachen Li
Fan Yang
Hengbo Ma
Srikanth Malla
Masayoshi Tomizuka
Chiho Choi
91
42
0
03 Aug 2021
Distributed Attention for Grounded Image Captioning
Distributed Attention for Grounded Image Captioning
Nenglun Chen
Xingjia Pan
Runnan Chen
Lei Yang
Zhiwen Lin
Yuqiang Ren
Haolei Yuan
Xiaowei Guo
Feiyue Huang
Wenping Wang
71
21
0
02 Aug 2021
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object
  Detection
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection
Jiajun Deng
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
3DPC
83
76
0
30 Jul 2021
ReFormer: The Relational Transformer for Image Captioning
ReFormer: The Relational Transformer for Image Captioning
Xuewen Yang
Yingru Liu
Xin Wang
ViT
103
57
0
29 Jul 2021
Bridging Gap between Image Pixels and Semantics via Supervision: A
  Survey
Bridging Gap between Image Pixels and Semantics via Supervision: A Survey
Jiali Duan
C.-C. Jay Kuo
96
8
0
29 Jul 2021
Greedy Gradient Ensemble for Robust Visual Question Answering
Greedy Gradient Ensemble for Robust Visual Question Answering
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Q. Tian
65
78
0
27 Jul 2021
Image Scene Graph Generation (SGG) Benchmark
Image Scene Graph Generation (SGG) Benchmark
Xiao Han
Jianwei Yang
Houdong Hu
Lei Zhang
Jianfeng Gao
Pengchuan Zhang
65
38
0
27 Jul 2021
Language Grounding with 3D Objects
Language Grounding with 3D Objects
Jesse Thomason
Mohit Shridhar
Yonatan Bisk
Chris Paxton
Luke Zettlemoyer
LM&Ro
88
53
0
26 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
72
128
0
26 Jul 2021
Language Models as Zero-shot Visual Semantic Learners
Language Models as Zero-shot Visual Semantic Learners
Yue Jiao
Jonathon S. Hare
Adam Prugel-Bennett
VLM
36
0
0
26 Jul 2021
Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Wentian Zhao
Yao Hu
Heda Wang
Xinxiao Wu
Jiebo Luo
55
49
0
26 Jul 2021
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization
  in Visual Question Answering
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Yifan Liu
Zhixiong Nan
N. Zheng
OOD
81
19
0
24 Jul 2021
Neural Abstructions: Abstractions that Support Construction for Grounded
  Language Learning
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
47
0
0
20 Jul 2021
Separating Skills and Concepts for Novel Visual Question Answering
Separating Skills and Concepts for Novel Visual Question Answering
Spencer Whitehead
Hui Wu
Heng Ji
Rogerio Feris
Kate Saenko
CoGe
95
34
0
19 Jul 2021
Variational Topic Inference for Chest X-Ray Report Generation
Variational Topic Inference for Chest X-Ray Report Generation
Ivona Najdenkoska
Xiantong Zhen
M. Worring
Ling Shao
MedIm
88
29
0
15 Jul 2021
Surgical Instruction Generation with Transformers
Surgical Instruction Generation with Transformers
Jinglu Zhang
Y. Nie
Jian Chang
Jiangning Zhang
MedIm
94
13
0
14 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DVVLMMLLM
153
270
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIPVLMMLLM
270
412
0
13 Jul 2021
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question
  Answering
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering
Rajat Koner
Hang Li
Marcel Hildebrandt
Deepan Das
Volker Tresp
Stephan Günnemann
63
34
0
13 Jul 2021
Human Attention during Goal-directed Reading Comprehension Relies on
  Task Optimization
Human Attention during Goal-directed Reading Comprehension Relies on Task Optimization
Jiajie Zou
Yuran Zhang
Jialu Li
Xing Tian
Nai Ding
AIMat
92
2
0
13 Jul 2021
Zero-shot Visual Question Answering using Knowledge Graph
Zero-shot Visual Question Answering using Knowledge Graph
Zhuo Chen
Jiaoyan Chen
Yuxia Geng
Jeff Z. Pan
Zonggang Yuan
Huajun Chen
87
70
0
12 Jul 2021
Modeling Explicit Concerning States for Reinforcement Learning in Visual
  Dialogue
Modeling Explicit Concerning States for Reinforcement Learning in Visual Dialogue
Zipeng Xu
Fandong Meng
Xiaojie Wang
Duo Zheng
Chenxu Lv
Jie Zhou
OffRL
72
6
0
12 Jul 2021
Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with
  Limited Supervision
Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with Limited Supervision
Gaurav Bhatt
Shivam Chandhok
V. Balasubramanian
56
1
0
11 Jul 2021
MuVAM: A Multi-View Attention-based Model for Medical Visual Question
  Answering
MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering
Haiwei Pan
Shuning He
Kejia Zhang
Bo Qu
Chunling Chen
Kun Shi
56
11
0
07 Jul 2021
Deep Learning for Embodied Vision Navigation: A Survey
Deep Learning for Embodied Vision Navigation: A Survey
Fengda Zhu
Yi Zhu
Vincent CS Lee
Xiaodan Liang
Xiaojun Chang
EgoVLM&Ro
101
0
0
07 Jul 2021
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior
  for Joint Image-Text Modeling
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling
Xiaoxue Zang
Lijuan Liu
Maria Wang
Yang Song
Hao Zhang
Jindong Chen
VLM
99
60
0
06 Jul 2021
Mind Your Outliers! Investigating the Negative Impact of Outliers on
  Active Learning for Visual Question Answering
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti
Ranjay Krishna
Li Fei-Fei
Christopher D. Manning
96
92
0
06 Jul 2021
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by
  Bidirectional Matching between Parts and Words
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words
Chuan Tang
Xi Yang
Bojian Wu
Zhizhong Han
Yi Chang
3DPC
91
13
0
05 Jul 2021
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Xuejiao Tang
Xin Huang
Wenbin Zhang
T. Child
Qiong Hu
Zhen Liu
Ji Zhang
LRM
81
19
0
04 Jul 2021
Case Relation Transformer: A Crossmodal Language Generation Model for
  Fetching Instructions
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions
Motonari Kambara
K. Sugiura
ViT
62
6
0
02 Jul 2021
Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake
  Monitoring
Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring
Jianing Qiu
Frank P.-W. Lo
Xiao Gu
M. Jobarteh
Wenyan Jia
...
M. McCrory
Edward Sazonov
Mingui Sun
Gary Frost
Benny Lo
EgoV
64
19
0
01 Jul 2021
Deep auxiliary learning for visual localization using colorization task
Deep auxiliary learning for visual localization using colorization task
Mi Tian
Qiong Nie
Hao Shen
Xiahua Xia
SSL
27
1
0
01 Jul 2021
Contrastive Semantic Similarity Learning for Image Captioning Evaluation
  with Intrinsic Auto-encoder
Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder
Chao Zeng
Tiesong Zhao
Sam Kwong
92
2
0
29 Jun 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded
  Compositional Visual Question Answering based on Scene Graphs
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
58
2
0
28 Jun 2021
Previous
123...202122...363738
Next