ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
When Brain-inspired AI Meets AGI
When Brain-inspired AI Meets AGI
Lin Zhao
Lu Zhang
Zihao Wu
Yuzhong Chen
Haixing Dai
...
Xi Jiang
Xiang Li
Dajiang Zhu
Dinggang Shen
Tianming Liu
AI4CE
76
91
0
28 Mar 2023
Zero-Shot Composed Image Retrieval with Textual Inversion
Zero-Shot Composed Image Retrieval with Textual Inversion
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
A. Bimbo
104
118
0
27 Mar 2023
Borrowing Human Senses: Comment-Aware Self-Training for Social Media
  Multimodal Classification
Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
Chunpu Xu
Jing Li
VLM
62
5
0
27 Mar 2023
Revisiting Multimodal Representation in Contrastive Learning: From Patch
  and Token Embeddings to Finite Discrete Tokens
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Yuxiao Chen
Jianbo Yuan
Yu Tian
Shijie Geng
Xinyu Li
Ding Zhou
Dimitris N. Metaxas
Hongxia Yang
86
37
0
27 Mar 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
78
8
0
26 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
85
51
0
25 Mar 2023
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Sukmin Yun
S. Park
Paul Hongsuck Seo
Jinwoo Shin
VLMMLLM
111
14
0
25 Mar 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained
  Experts
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
81
1
0
24 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLMMLLM
118
10
0
24 Mar 2023
Feature Separation and Recalibration for Adversarial Robustness
Feature Separation and Recalibration for Adversarial Robustness
Woo Jae Kim
Y. Cho
Junsik Jung
Sung-eui Yoon
AAML
119
22
0
24 Mar 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
94
13
0
23 Mar 2023
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph
  Generation with Decoupled Label Learning
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning
Wenqing Wang
Yawei Luo
Zhiqin Chen
Tao Jiang
Lei Chen
Yi Yang
Jun Xiao
78
8
0
23 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
88
32
0
23 Mar 2023
Integrating Image Features with Convolutional Sequence-to-sequence
  Network for Multilingual Visual Question Answering
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
T. M. Thai
Son T. Luu
86
0
0
22 Mar 2023
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image
  Person Retrieval
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
Ding Jiang
Mang Ye
106
157
0
22 Mar 2023
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel
  Storage
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage
Song Park
Sanghyuk Chun
Byeongho Heo
Wonjae Kim
Sangdoo Yun
VLMViT
97
8
0
20 Mar 2023
Location-Free Scene Graph Generation
Location-Free Scene Graph Generation
Ege Özsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
138
4
0
20 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and
  Compositional Reasoning
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
98
6
0
18 Mar 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
105
28
0
16 Mar 2023
Logical Implications for Visual Question Answering Consistency
Logical Implications for Visual Question Answering Consistency
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
81
9
0
16 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
164
15
0
14 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Shih-Han Chou
James J. Little
Leonid Sigal
74
2
0
14 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched
  Visual Descriptions
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
100
106
0
12 Mar 2023
Gradient-Regulated Meta-Prompt Learning for Generalizable
  Vision-Language Models
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
Juncheng Li
Minghe Gao
Longhui Wei
Siliang Tang
Wenqiao Zhang
Meng Li
Wei Ji
Qi Tian
Tat-Seng Chua
Yueting Zhuang
VLMVPVLM
107
21
0
12 Mar 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image
  Segmentation
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip Torr
131
24
0
11 Mar 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in
  Image Re-creation
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
87
0
0
10 Mar 2023
Robotic Applications of Pre-Trained Vision-Language Models to Various
  Recognition Behaviors
Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
K. Okada
Masayuki Inaba
LM&Ro
64
12
0
10 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
89
2
0
09 Mar 2023
VQA-based Robotic State Recognition Optimized with Genetic Algorithm
VQA-based Robotic State Recognition Optimized with Genetic Algorithm
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
K. Okada
Masayuki Inaba
48
16
0
09 Mar 2023
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation
  Models
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
MLLMLRM
150
649
0
08 Mar 2023
Interpretable Visual Question Answering Referring to Outside Knowledge
Interpretable Visual Question Answering Referring to Outside Knowledge
He Zhu
Ren Togo
Takahiro Ogawa
Miki Haseyama
65
0
0
08 Mar 2023
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation
  Using Scene Object Spectrum Grounding
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
Minyoung Hwang
Jaeyeon Jeong
Minsoo Kim
Yoonseon Oh
Songhwai Oh
91
21
0
07 Mar 2023
Describe me an Aucklet: Generating Grounded Perceptual Category
  Descriptions
Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions
Bill Noble
N. Ilinykh
114
0
0
07 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
97
21
0
07 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware
  Attention
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIPVLM
72
46
0
06 Mar 2023
VTQA: Visual Text Question Answering via Entity Alignment and
  Cross-Media Reasoning
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kan Chen
Xiangqian Wu
CoGe
57
9
0
05 Mar 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
80
1
0
05 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLMMLLM
142
25
0
04 Mar 2023
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong
  Few-shot Learners
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Renrui Zhang
Xiangfei Hu
Bohao Li
Siyuan Huang
Hanqiu Deng
Hongsheng Li
Yu Qiao
Peng Gao
VLMMLLM
116
181
0
03 Mar 2023
Intrinsic Physical Concepts Discovery with Object-Centric Predictive
  Models
Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
Qu Tang
Xiangyu Zhu
Zhen Lei
Zhaoxiang Zhang
OCL
109
8
0
03 Mar 2023
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource
  Visual Question Answering
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Jingjing Jiang
Nanning Zheng
MoE
122
6
0
02 Mar 2023
ESceme: Vision-and-Language Navigation with Episodic Scene Memory
ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Qinjie Zheng
Daqing Liu
Chaoyue Wang
Jing Zhang
Dadong Wang
Dacheng Tao
LM&Ro
88
5
0
02 Mar 2023
VQA with Cascade of Self- and Co-Attention Blocks
VQA with Cascade of Self- and Co-Attention Blocks
Aakansha Mishra
Ashish Anand
Prithwijit Guha
44
1
0
28 Feb 2023
Which One Are You Referring To? Multimodal Object Identification in
  Situated Dialogue
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
55
1
0
28 Feb 2023
Foundation Model Drives Weakly Incremental Learning for Semantic
  Segmentation
Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
Chaohui Yu
Qiang-feng Zhou
Jingliang Li
Jia-Chao Yuan
Zhibin Wang
Fan Wang
VLMCLL
73
14
0
28 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Contrastive Video Question Answering via Video Graph Transformer
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
115
37
0
27 Feb 2023
Medical visual question answering using joint self-supervised learning
Medical visual question answering using joint self-supervised learning
Yuan Zhou
Jing Mei
Yiqin Yu
Tanveer Syeda-Mahmood
MedIm
52
1
0
25 Feb 2023
Learning Visual Representations via Language-Guided Sampling
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSLVLM
124
28
0
23 Feb 2023
Quantifying & Modeling Multimodal Interactions: An Information
  Decomposition Framework
Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework
Paul Pu Liang
Yun Cheng
Xiang Fan
Chun Kai Ling
Suzanne Nie
...
Nicholas B. Allen
Randy P. Auerbach
Faisal Mahmood
Ruslan Salakhutdinov
Louis-Philippe Morency
116
37
0
23 Feb 2023
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Mengde Xu
Zheng Zhang
Fangyun Wei
Han Hu
Xiang Bai
VLM
89
273
0
23 Feb 2023
Previous
123...222324...585960
Next