ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
Few-shot Action Recognition with Captioning Foundation Models
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
126
7
0
16 Oct 2023
Bounding and Filling: A Fast and Flexible Framework for Image Captioning
Bounding and Filling: A Fast and Flexible Framework for Image Captioning
Zheng Ma
Changxin Wang
Bo Huang
Zi-Yue Zhu
Jianbing Zhang
60
1
0
15 Oct 2023
Question Answering for Electronic Health Records: A Scoping Review of
  datasets and models
Question Answering for Electronic Health Records: A Scoping Review of datasets and models
Jayetri Bardhan
Kirk Roberts
Daisy Zhe Wang
79
2
0
12 Oct 2023
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for
  Image Caption Generation
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation
Rashid Khan
Bingding Huang
Haseeb Hassan
Asim Zaman
Z. Ye
41
2
0
11 Oct 2023
Controllable Chest X-Ray Report Generation from Longitudinal
  Representations
Controllable Chest X-Ray Report Generation from Longitudinal Representations
Francesco Dalla Serra
Chaoyang Wang
Fani Deligianni
Jeffrey Stephen Dalton
Alison Q. OÑeil
MedIm
111
16
0
09 Oct 2023
C^2M-DoT: Cross-modal consistent multi-view medical report generation
  with domain transfer network
C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network
Ruizhi Wang
Xiang-Fei Wang
Jie Zhou
Thomas Lukasiewicz
Zhenghua Xu
71
1
0
09 Oct 2023
Module-wise Adaptive Distillation for Multimodality Foundation Models
Module-wise Adaptive Distillation for Multimodality Foundation Models
Chen Liang
Jiahui Yu
Ming-Hsuan Yang
Matthew A. Brown
Huayu Chen
Tuo Zhao
Boqing Gong
Tianyi Zhou
104
10
0
06 Oct 2023
Constructing Image-Text Pair Dataset from Books
Constructing Image-Text Pair Dataset from Books
Yamato Okamoto
Haruto Toyonaga
Yoshihisa Ijiri
Hirokatsu Kataoka
79
3
0
03 Oct 2023
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
  Retrieval
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
Hao Li
Marie-Jeanne Lesot
Lianli Gao
Xiaosu Zhu
Christophe Marsala
EDL
78
15
0
29 Sep 2023
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
Yangyang Guo
Haoyu Zhang
Yongkang Wong
Liqiang Nie
Mohan Kankanhalli
VLM
69
3
0
28 Sep 2023
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal
  Sponsored Search
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search
Yuanmin Tang
Daling Wang
Keke Gai
Wenfang Wu
Yifei Zhang
Gang Xiong
Qi Wu
73
4
0
28 Sep 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning
  Robustness
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
63
0
0
27 Sep 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCLVLM
69
4
0
26 Sep 2023
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile
  Screenshot Captioning
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile Screenshot Captioning
Ching-Yu Chiang
I-Hua Chang
Shih-Wei Liao
83
1
0
26 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
128
7
0
23 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets
  and Approaches
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Deepak Gupta
Kush Attal
Dina Demner-Fushman
LM&MA
49
1
0
21 Sep 2023
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
Ching-Hsun Tseng
Shin-Jye Lee
Po-Wei Cheng
Chien Lee
Chih-Chieh Hung
31
0
0
18 Sep 2023
Syntax Tree Constrained Graph Network for Visual Question Answering
Syntax Tree Constrained Graph Network for Visual Question Answering
Xiangrui Su
Qi Zhang
Chongyang Shi
Jiachang Liu
Liang Hu
GNNNAI
58
3
0
17 Sep 2023
Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking
Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking
Wenzhang Wei
Zhipeng Gui
Changguang Wu
Anqi Zhao
D. Peng
Huayi Wu
77
0
0
15 Sep 2023
Improving Multimodal Classification of Social Media Posts by Leveraging
  Image-Text Auxiliary Tasks
Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks
Danae Sánchez Villegas
Daniel Preoctiuc-Pietro
Nikolaos Aletras
64
3
0
14 Sep 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
  Reasoning
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
92
51
0
12 Sep 2023
Beyond Generation: Harnessing Text to Image Models for Object Detection
  and Segmentation
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
77
14
0
12 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
115
507
0
11 Sep 2023
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image
  Captioning
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
Guisheng Liu
Yi Li
Zhengcong Fei
Haiyan Fu
Xiangyang Luo
Yanqing Guo
VLMDiffM
83
8
0
10 Sep 2023
Towards Better Multi-modal Keyphrase Generation via Visual Entity
  Enhancement and Multi-granularity Image Noise Filtering
Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering
Yifan Dong
Suhang Wu
Fandong Meng
Jie Zhou
Xiaoli Wang
Jianxin Lin
Jinsong Su
79
4
0
09 Sep 2023
A Multimodal Analysis of Influencer Content on Twitter
A Multimodal Analysis of Influencer Content on Twitter
Danae Sánchez Villegas
Catalina Goanta
Nikolaos Aletras
115
6
0
06 Sep 2023
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End
  3D Dense Captioning
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Sijin Chen
Erik Cambria
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
66
23
0
06 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
70
2
0
06 Sep 2023
ATM: Action Temporality Modeling for Video Question Answering
ATM: Action Temporality Modeling for Video Question Answering
Junwen Chen
Jie Zhu
Yu Kong
60
1
0
05 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical
  Learning
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
68
7
0
05 Sep 2023
Towards Addressing the Misalignment of Object Proposal Evaluation for
  Vision-Language Tasks via Semantic Grounding
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding
Joshua Forster Feinglass
Yezhou Yang
51
2
0
01 Sep 2023
Distraction-free Embeddings for Robust VQA
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
100
0
0
31 Aug 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual
  Augmentation
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang
Zhiyong Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
96
8
0
31 Aug 2023
Can Prompt Learning Benefit Radiology Report Generation?
Can Prompt Learning Benefit Radiology Report Generation?
Jun Wang
Lixing Zhu
A. Bhalerao
Yulan He
MedIm
86
2
0
30 Aug 2023
Finding-Aware Anatomical Tokens for Chest X-Ray Automated Reporting
Finding-Aware Anatomical Tokens for Chest X-Ray Automated Reporting
Francesco Dalla Serra
Chaoyang Wang
Fani Deligianni
Jeffrey Stephen Dalton
Alison Q. OÑeil
MedIm
56
9
0
30 Aug 2023
Read-only Prompt Optimization for Vision-Language Few-shot Learning
Read-only Prompt Optimization for Vision-Language Few-shot Learning
Dongjun Lee
Seokwon Song
Jihee G. Suh
Joonmyeong Choi
S. Lee
Hyunwoo J.Kim
VLM
89
45
0
29 Aug 2023
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient
  Parameter and Memory
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Haiwen Diao
Bo Wan
Yanzhe Zhang
Xuecong Jia
Huchuan Lu
Long Chen
VLM
81
19
0
28 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
90
20
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
CgT-GAN: CLIP-guided Text GAN for Image Captioning
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLMCLIP
65
13
0
23 Aug 2023
MusicJam: Visualizing Music Insights via Generated Narrative
  Illustrations
MusicJam: Visualizing Music Insights via Generated Narrative Illustrations
Chuer Chen
Nan Cao
Jiani Hou
Yi Guo
Yulei Zhang
Yang Shi
DiffM
61
0
0
22 Aug 2023
CiteTracker: Correlating Image and Text for Visual Tracking
CiteTracker: Correlating Image and Text for Visual Tracking
Xin Li
Yuqing Huang
Zhenyu He
Yaowei Wang
Huchuan Lu
Ming-Hsuan Yang
99
30
0
22 Aug 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
58
4
0
22 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
Explore and Tell: Embodied Visual Captioning in 3D Environments
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
80
2
0
21 Aug 2023
Simple Baselines for Interactive Video Retrieval with Questions and
  Answers
Simple Baselines for Interactive Video Retrieval with Questions and Answers
Kaiqu Liang
Samuel Albanie
74
3
0
21 Aug 2023
Generic Attention-model Explainability by Weighted Relevance
  Accumulation
Generic Attention-model Explainability by Weighted Relevance Accumulation
Yiming Huang
Ao Jia
Xiaodan Zhang
Jiawei Zhang
46
1
0
20 Aug 2023
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language
  Models
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
Navid Rajabi
Jana Kosecka
VLM
111
12
0
18 Aug 2023
Artificial-Spiking Hierarchical Networks for Vision-Language
  Representation Learning
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Ye-Ting Chen
Siyu Zhang
Yaoru Sun
Weijian Liang
Haoran Wang
74
1
0
18 Aug 2023
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language
  Tasks
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks
Fawaz Sammani
Nikos Deligiannis
48
5
0
17 Aug 2023
Diagnosing Human-object Interaction Detectors
Diagnosing Human-object Interaction Detectors
Fangrui Zhu
Yiming Xie
Weidi Xie
Huaizu Jiang
77
8
0
16 Aug 2023
Visually-Aware Context Modeling for News Image Captioning
Visually-Aware Context Modeling for News Image Captioning
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
VLM
58
9
0
16 Aug 2023
Previous
123...567...363738
Next