ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided
  Code-Vision Representation
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Yangyi Chen
Xingyao Wang
Manling Li
Derek Hoiem
Heng Ji
81
12
0
22 Nov 2023
NERIF: GPT-4V for Automatic Scoring of Drawn Models
NERIF: GPT-4V for Automatic Scoring of Drawn Models
Gyeong-Geon Lee
Xiaoming Zhai
83
11
0
21 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language
  Explanation
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
104
4
0
21 Nov 2023
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
Minghe Gao
Juncheng Li
Hao Fei
Liang Pang
Wei Ji
Guoming Wang
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
73
9
0
21 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
169
291
0
21 Nov 2023
Causality is all you need
Causality is all you need
Ning Xu
Yifei Gao
Hongshuo Tian
Yongdong Zhang
An-An Liu
82
0
0
21 Nov 2023
Filling the Image Information Gap for VQA: Prompting Large Language
  Models to Proactively Ask Questions
Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
Ziyue Wang
Chi Chen
Peng Li
Yang Liu
LRM
78
16
0
20 Nov 2023
An Embodied Generalist Agent in 3D World
An Embodied Generalist Agent in 3D World
Jiangyong Huang
Silong Yong
Xiaojian Ma
Xiongkun Linghu
Puhao Li
Yan Wang
Qing Li
Song-Chun Zhu
Baoxiong Jia
Siyuan Huang
LM&Ro
118
176
0
18 Nov 2023
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go
  without Hallucination?
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?
Bangzheng Li
Ben Zhou
Fei Wang
Xingyu Fu
Dan Roth
Muhao Chen
HILMLRM
104
22
0
16 Nov 2023
Improving Zero-shot Visual Question Answering via Large Language Models
  with Reasoning Question Prompts
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
Yunshi Lan
Xiang Li
Xin Liu
Yang Li
Wei Qin
Weining Qian
LRMReLM
157
29
0
15 Nov 2023
GRASP: A novel benchmark for evaluating language GRounding And Situated
  Physics understanding in multimodal language models
GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Serwan Jassim
Mario S. Holubar
Annika Richter
Cornelius Wolff
Xenia Ohmer
Elia Bruni
ELM
94
14
0
15 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
96
5
0
15 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
90
12
0
14 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models
  with Image and Video Understanding
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
144
249
0
14 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
  Multi-modal Large Language Models
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Peng Gao
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Hongsheng Li
Yu Qiao
MLLMVLM
115
231
0
13 Nov 2023
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
  Tuning
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu-Gang Jiang
MLLMVLM
121
108
0
13 Nov 2023
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual
  Question Answering
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering
Yunxin Li
Longyue Wang
Baotian Hu
Xinyu Chen
Wanqi Zhong
Chenyang Lyu
Wei Wang
Min Zhang
ELM
77
22
0
13 Nov 2023
Teach me with a Whisper: Enhancing Large Language Models for Analyzing
  Spoken Transcripts using Speech Embeddings
Teach me with a Whisper: Enhancing Large Language Models for Analyzing Spoken Transcripts using Speech Embeddings
Fatema Hasan
Yulong Li
James R. Foulds
Shimei Pan
Bishwaranjan Bhattacharjee
79
2
0
13 Nov 2023
Knowledgeable Preference Alignment for LLMs in Domain-specific Question
  Answering
Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering
Yichi Zhang
Zhuo Chen
Yin Fang
Yanxi Lu
Fangming Li
Wen Zhang
Hua-zeng Chen
114
31
0
11 Nov 2023
Towards A Unified Neural Architecture for Visual Recognition and
  Reasoning
Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Calvin Luo
Boqing Gong
Ting Chen
Chen Sun
OCLObjD
52
1
0
10 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
59
6
0
09 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural
  Language
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
71
2
0
08 Nov 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLMVLM
123
55
0
08 Nov 2023
OtterHD: A High-Resolution Multi-modality Model
OtterHD: A High-Resolution Multi-modality Model
Yue Liu
Peiyuan Zhang
Jingkang Yang
Yuanhan Zhang
Fanyi Pu
Ziwei Liu
VLMMLLM
100
66
0
07 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
77
7
0
07 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLMMLLM
176
517
0
06 Nov 2023
Perturbation-based Active Learning for Question Answering
Perturbation-based Active Learning for Question Answering
Fan Luo
Mihai Surdeanu
81
0
0
04 Nov 2023
A New Fine-grained Alignment Method for Image-text Matching
A New Fine-grained Alignment Method for Image-text Matching
Yang Zhang
36
1
0
03 Nov 2023
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life
  Videos
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos
Te-Lin Wu
Zi-Yi Dou
Qingyuan Hu
Yu Hou
Nischal Reddy Chandra
Marjorie Freedman
R. Weischedel
Nanyun Peng
114
8
0
02 Nov 2023
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
Liqiang Jing
Ruosen Li
Yunmo Chen
Mengzhao Jia
Xinya Du
MLLM
93
7
0
02 Nov 2023
VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization
VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization
Suraj Jyothi Unni
Raha Moraffah
Huan Liu
86
3
0
01 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLMDiffM
103
11
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
151
44
0
01 Nov 2023
Language Guided Visual Question Answering: Elevate Your Multimodal
  Language Model Using Knowledge-Enriched Prompts
Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
Deepanway Ghosal
Navonil Majumder
Roy Ka-wei Lee
Rada Mihalcea
Soujanya Poria
64
8
0
31 Oct 2023
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow
  Linguistic Reasoning
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
105
2
0
30 Oct 2023
Disentangled Counterfactual Learning for Physical Audiovisual
  Commonsense Reasoning
Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
Changsheng Lv
Shuai Zhang
Yapeng Tian
Mengshi Qi
Huadong Ma
CML
100
18
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
63
3
0
30 Oct 2023
Learning to Follow Object-Centric Image Editing Instructions Faithfully
Learning to Follow Object-Centric Image Editing Instructions Faithfully
Tuhin Chakrabarty
Kanishk Singh
Arkadiy Saakyan
Smaranda Muresan
DiffM
77
7
0
29 Oct 2023
Dynamic Task and Weight Prioritization Curriculum Learning for
  Multimodal Imagery
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery
H. F. Alsan
Taner Arsan
72
2
0
29 Oct 2023
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health
  Records with Chest X-ray Images
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Seongsu Bae
Daeun Kyung
Jaehee Ryu
Eunbyeol Cho
Gyubok Lee
...
Jungwoo Oh
Lei Ji
E. Chang
Tackeun Kim
Edward Choi
120
23
0
28 Oct 2023
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model
  for Visual Question Answering in Vietnamese
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese
Khiem Vinh Tran
Hao Phu Phan
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
54
7
0
27 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan Yuille
CoGe
105
14
0
27 Oct 2023
Impressions: Understanding Visual Semiotics and Aesthetic Impact
Impressions: Understanding Visual Semiotics and Aesthetic Impact
Julia Kruk
Caleb Ziems
Diyi Yang
63
2
0
27 Oct 2023
Evaluating Bias and Fairness in Gender-Neutral Pretrained
  Vision-and-Language Models
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
Laura Cabello
Emanuele Bugliarello
Stephanie Brandl
Desmond Elliott
74
7
0
26 Oct 2023
Exploring Question Decomposition for Zero-Shot VQA
Exploring Question Decomposition for Zero-Shot VQA
Zaid Khan
B. Vijaykumar
S. Schulter
Manmohan Chandraker
Yun Fu
ReLM
62
12
0
25 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
73
0
0
25 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
80
10
0
25 Oct 2023
$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual
  $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal
  $\mathbb{GR}$aphs
VD\mathbb{VD}VD-GR\mathbb{GR}GR: Boosting V\mathbb{V}Visual D\mathbb{D}Dialog with Cascaded Spatial-Temporal Multi-Modal GR\mathbb{GR}GRaphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
58
4
0
25 Oct 2023
Binary State Recognition by Robots using Visual Question Answering of
  Pre-Trained Vision-Language Model
Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
K. Okada
Masayuki Inaba
30
0
0
25 Oct 2023
Video Referring Expression Comprehension via Transformer with
  Content-conditioned Query
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji
Meng Cao
Tengtao Song
Long Chen
Yi Wang
Yuexian Zou
88
6
0
25 Oct 2023
Previous
123...151617...585960
Next