ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Adaptive loose optimization for robust question answering
Adaptive loose optimization for robust question answering
Jie Ma
Pinghui Wang
Ze-you Wang
Dechen Kong
Min Hu
Tingxu Han
Jun Liu
OOD
129
4
0
06 May 2023
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large
  Language Model Signals for Science Question Answering
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering
Lei Wang
Yilang Hu
Jiabang He
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
LRMMLLM
114
48
0
05 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal
  Controls
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
186
89
0
04 May 2023
Transforming Visual Scene Graphs to Image Captions
Transforming Visual Scene Graphs to Image Captions
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
97
21
0
03 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
78
10
0
03 May 2023
Multimodal Graph Transformer for Multimodal Question Answering
Multimodal Graph Transformer for Multimodal Question Answering
Xuehai He
Xin Eric Wang
81
9
0
30 Apr 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
118
588
0
28 Apr 2023
Interpreting Vision and Language Generative Models with Semantic Visual
  Priors
Interpreting Vision and Language Generative Models with Semantic Visual Priors
Michele Cafagna
L. Rojas-Barahona
Kees van Deemter
Albert Gatt
FAttVLM
59
3
0
28 Apr 2023
A Symmetric Dual Encoding Dense Retrieval Framework for
  Knowledge-Intensive Visual Question Answering
A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering
Alireza Salemi
Juan Altmayer Pizzorno
Hamed Zamani
38
15
0
26 Apr 2023
Multi-Modality Deep Network for Extreme Learned Image Compression
Multi-Modality Deep Network for Extreme Learned Image Compression
Xuhao Jiang
Weimin Tan
Tian Tan
Bo Yan
Liquan Shen
26
18
0
26 Apr 2023
From Association to Generation: Text-only Captioning by Unsupervised
  Cross-modal Mapping
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Junyan Wang
Ming Yan
Yi Zhang
Jitao Sang
CLIPVLM
74
9
0
26 Apr 2023
Learnable Pillar-based Re-ranking for Image-Text Retrieval
Learnable Pillar-based Re-ranking for Image-Text Retrieval
Leigang Qu
Meng Liu
Wenjie Wang
Zhedong Zheng
Liqiang Nie
Tat-Seng Chua
VLM
49
15
0
25 Apr 2023
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text
  Matching Models
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models
Seulki Park
Daeho Um
Hajung Yoon
Sanghyuk Chun
Sangdoo Yun
Hawook Jeong
91
3
0
21 Apr 2023
Chameleon: Plug-and-Play Compositional Reasoning with Large Language
  Models
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Pan Lu
Baolin Peng
Hao Cheng
Michel Galley
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Jianfeng Gao
KELMMLLMLRM
150
325
0
19 Apr 2023
MPMQA: Multimodal Question Answering on Product Manuals
MPMQA: Multimodal Question Answering on Product Manuals
Liangfu Zhang
Anwen Hu
Jing Zhang
Shuo Hu
Qin Jin
84
10
0
19 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
128
112
0
17 Apr 2023
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
Yang Liu
Guanbin Li
Jingzhou Luo
Liang Lin
BDLLRM
101
5
0
17 Apr 2023
Chain of Thought Prompt Tuning in Vision Language Models
Chain of Thought Prompt Tuning in Vision Language Models
Jiaxin Ge
Hongyin Luo
Siyuan Qian
Yulu Gan
Jie Fu
Shanghang Zhang
VLMLRMMLLM
109
29
0
16 Apr 2023
MvCo-DoT:Multi-View Contrastive Domain Transfer Network for Medical
  Report Generation
MvCo-DoT:Multi-View Contrastive Domain Transfer Network for Medical Report Generation
Ruizhi Wang
Xiang-Fei Wang
Zhenghua Xu
Wenting Xu
Junyang Chen
Thomas Lukasiewicz
47
6
0
15 Apr 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
A-CAP: Anticipation Captioning with Commonsense Knowledge
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
65
1
0
13 Apr 2023
ImageCaptioner$^2$: Image Captioner for Image Captioning Bias
  Amplification Assessment
ImageCaptioner2^22: Image Captioner for Image Captioning Bias Amplification Assessment
Eslam Mohamed Bakr
Pengzhan Sun
Erran L. Li
Mohamed Elhoseiny
39
6
0
10 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via
  Word-Region Alignment
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLMObjDCLIP
121
79
0
10 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and
  Language
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIPVLM
55
1
0
10 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Model-Agnostic Gender Debiased Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
122
18
0
07 Apr 2023
METransformer: Radiology Report Generation by Transformer with Multiple
  Learnable Expert Tokens
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
77
76
0
05 Apr 2023
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image
  Generation
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
Mayu Otani
Riku Togashi
Yu Sawai
Ryosuke Ishigami
Yuta Nakashima
Esa Rahtu
J. Heikkilä
Shiníchi Satoh
100
65
0
04 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
92
19
0
04 Apr 2023
SC-ML: Self-supervised Counterfactual Metric Learning for Debiased
  Visual Question Answering
SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering
Xinyao Shu
Shiyang Yan
Xu Yang
Ziheng Wu
Zhongfeng Chen
Zhenyu Lu
SSL
63
0
0
04 Apr 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Ziqiang Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
55
7
0
04 Apr 2023
Changes to Captions: An Attentive Network for Remote Sensing Change
  Captioning
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
Shizhen Chang
Pedram Ghamisi
95
46
0
03 Apr 2023
Multi-Modal Representation Learning with Text-Driven Soft Masks
Multi-Modal Representation Learning with Text-Driven Soft Masks
Jaeyoo Park
Bohyung Han
SSL
49
4
0
03 Apr 2023
Instance-Level Trojan Attacks on Visual Question Answering via
  Adversarial Learning in Neuron Activation Space
Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space
Yuwei Sun
H. Ochiai
Jun Sakuma
AAML
63
6
0
02 Apr 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
77
35
0
29 Mar 2023
Structured Video-Language Modeling with Temporal Grouping and Spatial
  Grounding
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
Yuanhao Xiong
Long Zhao
Boqing Gong
Ming-Hsuan Yang
Florian Schroff
Ting Liu
Cho-Jui Hsieh
Liangzhe Yuan
VLM
55
0
0
28 Mar 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init
  Attention
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
173
787
0
28 Mar 2023
Medical Image Analysis using Deep Relational Learning
Medical Image Analysis using Deep Relational Learning
Zhi-Hu Liu
MedIm
59
0
0
28 Mar 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology
  Report Generation
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
58
33
0
28 Mar 2023
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
Xiangyang Li
Zihan Wang
Jiahao Yang
Yaowei Wang
Shuqiang Jiang
LM&Ro
87
42
0
28 Mar 2023
Curriculum Learning for Compositional Visual Reasoning
Curriculum Learning for Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
82
3
0
27 Mar 2023
Plug-and-Play Regulators for Image-Text Matching
Plug-and-Play Regulators for Image-Text Matching
Haiwen Diao
Yanzhe Zhang
Wen Liu
Xiang Ruan
Huchuan Lu
56
21
0
23 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
88
32
0
23 Mar 2023
BiCro: Noisy Correspondence Rectification for Multi-modality Data via
  Bi-directional Cross-modal Similarity Consistency
BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
Shuo Yang
Zhaopan Xu
Kai Wang
Yang You
Huanjin Yao
Tongliang Liu
Min Xu
102
29
0
22 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning
  Evaluation
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
81
60
0
21 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
  GPT-5 All You Need?
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
186
170
0
21 Mar 2023
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel
  Storage
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage
Song Park
Sanghyuk Chun
Byeongho Heo
Wonjae Kim
Sangdoo Yun
VLMViT
78
8
0
20 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
127
2
0
19 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and
  Compositional Reasoning
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
87
6
0
18 Mar 2023
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
  Generation
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
Mingjie Li
Bingqian Lin
Zicong Chen
Haokun Lin
Xiaodan Liang
Xiaojun Chang
MedIm
82
117
0
18 Mar 2023
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
Yangqiaoyu Zhou
Kai-Lang Yao
Wusuo Li
MedIm
39
1
0
17 Mar 2023
Cross-Modal Causal Intervention for Medical Report Generation
Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen
Yang-Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
82
5
0
16 Mar 2023
Previous
123...8910...363738
Next