ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
From Seconds to Hours: Reviewing MultiModal Large Language Models on
  Comprehensive Long Video Understanding
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
118
7
0
27 Sep 2024
DARE: Diverse Visual Question Answering with Robustness Evaluation
DARE: Diverse Visual Question Answering with Robustness Evaluation
Hannah Sterz
Jonas Pfeiffer
Ivan Vulić
OODVLM
41
2
0
26 Sep 2024
M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
M2^22PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLMVLMLRM
99
19
0
24 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
139
9
0
23 Sep 2024
Repairs in a Block World: A New Benchmark for Handling User Corrections
  with Multi-Modal Language Models
Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models
Javier Chiyah-Garcia
Alessandro Suglia
Arash Eshghi
KELM
46
2
0
21 Sep 2024
Temporally Consistent Factuality Probing for Large Language Models
Temporally Consistent Factuality Probing for Large Language Models
Ashutosh Bajpai
Aaryan Goyal
Atif Anwer
Tanmoy Chakraborty
HILM
89
1
0
21 Sep 2024
Instruction-guided Multi-Granularity Segmentation and Captioning with
  Large Multimodal Model
Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model
Li Zhou
Xu Yuan
Zenghui Sun
Zikun Zhou
Jingsong Lan
VLMMLLM
407
4
0
20 Sep 2024
Vision Language Models Can Parse Floor Plan Maps
Vision Language Models Can Parse Floor Plan Maps
David DeFazio
Hrudayangam Mehta
Jeremy Blackburn
Shiqi Zhang
CoGe
79
0
0
19 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
148
2
0
19 Sep 2024
CoCA: Regaining Safety-awareness of Multimodal Large Language Models
  with Constitutional Calibration
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Jiahui Gao
Renjie Pi
Tianyang Han
Han Wu
Lanqing Hong
Lingpeng Kong
Xin Jiang
Zhenguo Li
125
8
0
17 Sep 2024
Contextual Breach: Assessing the Robustness of Transformer-based QA
  Models
Contextual Breach: Assessing the Robustness of Transformer-based QA Models
Asir Saadat
Nahian Ibn Asad
Md Farhan Ishmam
AAML
85
0
0
17 Sep 2024
Reasoning Paths with Reference Objects Elicit Quantitative Spatial
  Reasoning in Large Vision-Language Models
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
Yuan-Hong Liao
Rafid Mahmood
Sanja Fidler
David Acuna
ReLMLRM
88
16
0
15 Sep 2024
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
Yiyi Tao
Zhuoyue Wang
Hang Zhang
Lun Wang
VLM
99
16
0
15 Sep 2024
Guiding Vision-Language Model Selection for Visual Question-Answering
  Across Tasks, Domains, and Knowledge Types
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
Neelabh Sinha
Vinija Jain
Aman Chadha
70
3
0
14 Sep 2024
What Makes a Maze Look Like a Maze?
What Makes a Maze Look Like a Maze?
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
130
6
0
12 Sep 2024
WebQuest: A Benchmark for Multimodal QA on Web Page Sequences
WebQuest: A Benchmark for Multimodal QA on Web Page Sequences
Maria Wang
Srinivas Sunkara
Gilles Baechler
Jason Lin
Yun Zhu
Fedir Zubach
Lei Shu
Jindong Chen
LRMLLMAG
57
2
0
06 Sep 2024
Generating Faithful and Salient Text from Multimodal Data
Generating Faithful and Salient Text from Multimodal Data
Tahsina Hashem
Weiqing Wang
Derry Tanti Wijaya
Mohammed Eunus Ali
Yuan-Fang Li
98
0
0
06 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
264
15
0
02 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
106
21
0
01 Sep 2024
MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
MAPWise: Evaluating Vision-Language Models for Advanced Map Queries
Srija Mukhopadhyay
Abhishek Rajgaria
Prerana Khatiwada
Vivek Gupta
Dan Roth
28
0
0
30 Aug 2024
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding
  Data
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data
Spencer Whitehead
Jacob Phillips
Sean Hendryx
63
0
0
30 Aug 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLMMLLM
116
121
0
29 Aug 2024
ResVG: Enhancing Relation and Semantic Understanding in Multiple
  Instances for Visual Grounding
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding
Minghang Zheng
Jiahua Zhang
Qingchao Chen
Yuxin Peng
Yang Liu
ObjD
96
2
0
29 Aug 2024
Can SAR improve RSVQA performance?
Can SAR improve RSVQA performance?
Lucrezia Tosato
Sylvain Lobry
Flora Weissgerber
Laurent Wendling
42
1
0
28 Aug 2024
Evaluating Attribute Comprehension in Large Vision-Language Models
Evaluating Attribute Comprehension in Large Vision-Language Models
Haiwen Zhang
Zixi Yang
Yuanzhi Liu
Xinran Wang
Zheqi He
Kongming Liang
Zhanyu Ma
ELM
57
0
0
25 Aug 2024
Probing the Robustness of Vision-Language Pretrained Models: A
  Multimodal Adversarial Attack Approach
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach
Jiwei Guan
Tianyu Ding
Longbing Cao
Lei Pan
Chen Wang
Xi Zheng
AAML
118
2
0
24 Aug 2024
Identifying Crucial Objects in Blind and Low-Vision Individuals'
  Navigation
Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation
Md Touhidul Islam
Imran Kabir
Elena Ariel Pearce
Md. Alimoor Reza
Syed Masum Billah
35
3
0
23 Aug 2024
MultiMed: Massively Multimodal and Multitask Medical Understanding
MultiMed: Massively Multimodal and Multitask Medical Understanding
Shentong Mo
Paul Pu Liang
LM&MA
72
2
0
22 Aug 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
138
78
0
22 Aug 2024
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual
  Integration in MLLMs
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
Yuanyang Yin
Yaqi Zhao
Yajie Zhang
Ke Lin
Jiahao Wang
Xin Tao
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
LRM
111
9
0
21 Aug 2024
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in
  Visual Question Answering
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering
Yuliang Cai
Mohammad Rostami
CLLVLMMLLM
128
4
0
21 Aug 2024
DocTabQA: Answering Questions from Long Documents Using Tables
DocTabQA: Answering Questions from Long Documents Using Tables
Haochen Wang
Kai Hu
Haoyu Dong
Liangcai Gao
RALMLMTD
60
3
0
21 Aug 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual
  Instruction Tuning
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
Hao Fei
Xunliang Cai
LRM
106
7
0
21 Aug 2024
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted
  Attack for Image-to-Text Models
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models
Qingyuan Zeng
Zhenzhong Wang
Yiu-ming Cheung
Min Jiang
AAML
88
2
0
16 Aug 2024
Multi-Modal Dialogue State Tracking for Playing GuessWhich Game
Multi-Modal Dialogue State Tracking for Playing GuessWhich Game
Wei Pang
Ruixue Duan
Jinfu Yang
Ning Li
67
0
0
15 Aug 2024
Autonomous Behavior Planning For Humanoid Loco-manipulation Through
  Grounded Language Model
Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model
Jin Wang
Arturo Laurenzi
Nikos Tsagarakis
LM&Ro
93
7
0
15 Aug 2024
A Survey on Integrated Sensing, Communication, and Computation
A Survey on Integrated Sensing, Communication, and Computation
Dingzhu Wen
Yong Zhou
Xiaoyang Li
Yuanming Shi
Kaibin Huang
Khaled B. Letaief
74
33
0
15 Aug 2024
Enhancing Visual Dialog State Tracking through Iterative Object-Entity
  Alignment in Multi-Round Conversations
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations
Wei Pang
Ruixue Duan
Jinfu Yang
Ning Li
48
0
0
13 Aug 2024
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust
  Visual Question-Localized Answering in Robotic Surgery
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
Long Bai
Guankun Wang
Mobarakol Islam
Lalithkumar Seenivasan
An-Chi Wang
Hongliang Ren
107
17
0
09 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
102
14
0
08 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas Guibas
P. Milanfar
Feng Yang
98
2
0
07 Aug 2024
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
Weiqi Feng
Yangrui Chen
Shaoyu Wang
Size Zheng
H. Lin
Minlan Yu
MLLMAI4CE
145
4
0
07 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLMSyDaVLM
169
865
0
06 Aug 2024
Targeted Visual Prompting for Medical Visual Question Answering
Targeted Visual Prompting for Medical Visual Question Answering
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
51
2
0
06 Aug 2024
Fairness and Bias Mitigation in Computer Vision: A Survey
Fairness and Bias Mitigation in Computer Vision: A Survey
Sepehr Dehdashtian
Ruozhen He
Yi Li
Guha Balakrishnan
Nuno Vasconcelos
Vicente Ordonez
Vishnu Boddeti
141
5
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
87
1
0
04 Aug 2024
The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models
The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models
Simone Caldarella
Massimiliano Mancini
Elisa Ricci
Rahaf Aljundi
PILM
76
2
0
02 Aug 2024
Compositional Physical Reasoning of Objects and Events from Videos
Compositional Physical Reasoning of Objects and Events from Videos
Zhenfang Chen
Shilong Dong
Kexin Yi
Yunzhu Li
Mingyu Ding
Antonio Torralba
Joshua B. Tenenbaum
Chuang Gan
OCL
126
3
0
02 Aug 2024
Towards Flexible Evaluation for Generative Visual Question Answering
Towards Flexible Evaluation for Generative Visual Question Answering
Huishan Ji
Q. Si
Zheng Lin
Weiping Wang
90
1
0
01 Aug 2024
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Atsuyuki Miyai
Jingkang Yang
Jingyang Zhang
Yifei Ming
Sisir Dhakal
...
Yixuan Li
Hai "Helen" Li
Ziwei Liu
Toshihiko Yamasaki
Kiyoharu Aizawa
135
13
0
31 Jul 2024
Previous
123...789...585960
Next