ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.07998
  4. Cited By
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
v1v2v3 (latest)

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"

50 / 1,868 papers shown
Title
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas Guibas
P. Milanfar
Feng Yang
98
2
0
07 Aug 2024
Targeted Visual Prompting for Medical Visual Question Answering
Targeted Visual Prompting for Medical Visual Question Answering
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
51
2
0
06 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual
  Scanpaths
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen
Ming Jiang
Qi Zhao
66
3
0
05 Aug 2024
PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy
  Correspondence Learning in Cross-Modal Retrieval
PC2^22: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval
Yue Duan
Zhangxuan Gu
ZhenZhe Ying
Wei Li
Yu Zhang
Zibin Zheng
52
2
0
02 Aug 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual
  Question Answering
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
71
1
0
30 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
75
7
0
29 Jul 2024
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval
  for comparative performance analysis
FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis
Mikel Williams-Lekuona
Georgina Cosma
73
0
0
29 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
101
3
0
26 Jul 2024
Shapley Value-based Contrastive Alignment for Multimodal Information
  Extraction
Shapley Value-based Contrastive Alignment for Multimodal Information Extraction
Wen Luo
Yu Xia
Tianshu Shen
Sujian Li
57
1
0
25 Jul 2024
Unveiling and Mitigating Bias in Audio Visual Segmentation
Unveiling and Mitigating Bias in Audio Visual Segmentation
Peiwen Sun
Honggang Zhang
Di Hu
91
3
0
23 Jul 2024
Chameleon: Images Are What You Need For Multimodal Learning Robust To
  Missing Modalities
Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
Muhammad Irzam Liaqat
Shah Nawaz
Muhammad Zaigham Zaheer
M. S. Saeed
Hassan Sajjad
Tom De Schepper
Karthik Nandakumar
Muhammad Haris Khan
96
1
0
23 Jul 2024
EfficientCD: A New Strategy For Change Detection Based With Bi-temporal
  Layers Exchanged
EfficientCD: A New Strategy For Change Detection Based With Bi-temporal Layers Exchanged
Sijun Dong
Yuwei Zhu
Geng Chen
Xiaoliang Meng
68
4
0
22 Jul 2024
Continual Panoptic Perception: Towards Multi-modal Incremental
  Interpretation of Remote Sensing Images
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
Bo Yuan
Danpei Zhao
Zhuoran Liu
Wentao Li
Tian Li
CLLVLM
87
2
0
19 Jul 2024
Controllable Contextualized Image Captioning: Directing the Visual
  Narrative through User-Defined Highlights
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
72
1
0
16 Jul 2024
Visual Prompt Selection for In-Context Learning Segmentation
Visual Prompt Selection for In-Context Learning Segmentation
Wei Suo
Lanqing Lai
Mengyang Sun
Hanwang Zhang
Peng Wang
Yanning Zhang
VLM
92
3
0
14 Jul 2024
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring
  Image Segmentation
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu
Paul Hongsuck Seo
Jeany Son
DiffM
145
6
0
10 Jul 2024
Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses
  from Diagram
Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram
Ming-Liang Zhang
Zhong-Zhi Li
Fei Yin
Liang Lin
Cheng-Lin Liu
LRM
77
8
0
10 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with
  Answer-awareness and Region-reference
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
70
0
0
06 Jul 2024
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang
Mengxue Qu
Yunchao Wei
Yan Yan
107
6
0
03 Jul 2024
Visual Grounding with Attention-Driven Constraint Balancing
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
74
4
0
03 Jul 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual
  Grounding
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
119
9
0
03 Jul 2024
Fine-Grained Scene Image Classification with Modality-Agnostic Adapter
Fine-Grained Scene Image Classification with Modality-Agnostic Adapter
Yiqun Wang
Zhao Zhou
Xiangcheng Du
Xingjiao Wu
Yingbin Zheng
Cheng Jin
79
0
0
03 Jul 2024
From Category to Scenery: An End-to-End Framework for Multi-Person
  Human-Object Interaction Recognition in Videos
From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos
Tanqiu Qiao
Ruochen Li
Frederick W. B. Li
Hubert P. H. Shum
121
1
0
01 Jul 2024
Analyzing Quality, Bias, and Performance in Text-to-Image Generative
  Models
Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models
Nila Masrourisaadat
Nazanin Sedaghatkish
Fatemeh Sarshartehrani
Edward A. Fox
118
9
0
28 Jun 2024
Enhancing Continual Learning in Visual Question Answering with
  Modality-Aware Feature Distillation
Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
Malvina Nikandrou
Georgios Pantazopoulos
Ioannis Konstas
Alessandro Suglia
77
2
0
27 Jun 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
Hao Fei
Tat-Seng Chua
Shuicheng Yan
AI4TS
124
43
0
27 Jun 2024
On the Role of Visual Grounding in VQA
On the Role of Visual Grounding in VQA
Daniel Reich
Tanja Schultz
94
1
0
26 Jun 2024
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for
  Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
Ju-Seung Byun
Jiyun Chun
Jihyung Kil
Andrew Perrault
ReLMLRM
129
3
0
25 Jun 2024
Composing Object Relations and Attributes for Image-Text Matching
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham
Chuong Huynh
Ser-Nam Lim
Abhinav Shrivastava
CoGe
77
8
0
17 Jun 2024
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A
  Survey
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
Hao Yang
Yanyan Zhao
Yang Wu
Shilong Wang
Tian Zheng
Hongbo Zhang
Zongyang Ma
Wanxiang Che
Bing Qin
131
14
0
12 Jun 2024
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based
  Reformulation and Box-Based Segmentation
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation
Jinyuan Li
Ziyan Li
Han Li
Jianfei Yu
Rui Xia
Di Sun
Gang Pan
67
2
0
11 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAMLVLM
133
14
0
08 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
155
37
0
07 Jun 2024
Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for
  Image-Text Matching
Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching
Xuri Ge
Fuhai Chen
Songpei Xu
Fuxiang Tao
Jie Wang
Joemon M. Jose
65
1
0
05 Jun 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in
  Large Multi-modal Models
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim
Hyunjun Kim
Yeonju Kim
Yong Man Ro
MLLM
117
16
0
04 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
77
0
0
01 Jun 2024
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits
  Multimodal Reasoning
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
Cheng Tan
Jingxuan Wei
Linzhuang Sun
Zhangyang Gao
Siyuan Li
Bihui Yu
Ruifeng Guo
Stan Z. Li
ReLMLRM3DV
110
7
0
31 May 2024
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical
  Study of VCR
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Xiaolin Chen
Liqiang Nie
Mohan S. Kankanhalli
LRM
52
8
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
117
0
0
27 May 2024
Active Learning for Finely-Categorized Image-Text Retrieval by Selecting
  Hard Negative Unpaired Samples
Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples
D. Jo
Kyuewang Lee
Jaeho Chung
Jin Young Choi
73
0
0
25 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
333
54
0
23 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
77
12
0
21 May 2024
Context-Enhanced Video Moment Retrieval with Large Language Models
Context-Enhanced Video Moment Retrieval with Large Language Models
Weijia Liu
Bo Miao
Jiuxin Cao
Xueling Zhu
Bo Liu
Mehwish Nasim
Ajmal Mian
130
2
0
21 May 2024
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based
  Inferencing
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
Siddhant Agarwal
Shivam Sharma
Preslav Nakov
Tanmoy Chakraborty
94
4
0
18 May 2024
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and
  Reasoning Chains
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains
Zhaohuan Zhan
Lisha Yu
Sijie Yu
Guang Tan
LLMAGLM&Ro
135
11
0
17 May 2024
Learning Object-Centric Representation via Reverse Hierarchy Guidance
Learning Object-Centric Representation via Reverse Hierarchy Guidance
Junhong Zou
Xiangyu Zhu
Zhaoxiang Zhang
Zhen Lei
BDLObjDOCL
76
0
0
17 May 2024
VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
Jiangyuan Guo
Wei Chen
Yuxuan Sun
Jia-lin Xu
Bo Ai
123
4
0
17 May 2024
Enhancing Semantics in Multimodal Chain of Thought via Soft Negative
  Sampling
Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling
Guangmin Zheng
Jin Wang
Xiaobing Zhou
Xuejie Zhang
LRM
58
2
0
16 May 2024
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge
Fangyin Wei
Siddharth Gururani
Nayeon Lee
Xuan Li
Huayu Chen
CoGeDiffM
72
17
0
30 Apr 2024
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text
  Matching
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching
Haiwen Diao
Ying Zhang
Shang Gao
Xiang Ruan
Huchuan Lu
72
4
0
28 Apr 2024
Previous
123456...363738
Next