ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Assessing News Thumbnail Representativeness: Counterfactual text can
  enhance the cross-modal matching ability
Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability
Yejun Yoon
Seunghyun Yoon
Kunwoo Park
95
1
0
17 Feb 2024
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in
  Visual Question Answering
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
Jihyung Kil
Farideh Tavazoee
Dongyeop Kang
Joo-Kyung Kim
LRM
69
3
0
16 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLMMLLM
59
12
0
13 Feb 2024
Visually Dehallucinative Instruction Generation
Visually Dehallucinative Instruction Generation
Sungguk Cha
Jusung Lee
Younghyun Lee
Cheoljong Yang
MLLM
51
6
0
13 Feb 2024
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Ashish Shenoy
Yichao Lu
Srihari Jayakumar
Debojeet Chatterjee
Mohsen Moslehpour
...
Shicong Zhao
Longfang Zhao
Ankit Ramchandani
Xin Luna Dong
Anuj Kumar
MLLM
71
3
0
12 Feb 2024
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive
  Reasoning through Theory of Mind
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao
Xin Lin
Qin Ni
Liang He
84
4
0
12 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting
  Classification datasets and their semantic hierarchy
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Simon Ging
M. A. Bravo
Thomas Brox
VLM
158
12
0
11 Feb 2024
A Benchmark for Multi-modal Foundation Models on Low-level Vision: from
  Single Images to Pairs
A Benchmark for Multi-modal Foundation Models on Low-level Vision: from Single Images to Pairs
Zicheng Zhang
Haoning Wu
Erli Zhang
Guangtao Zhai
Weisi Lin
VLM
76
8
0
11 Feb 2024
Copycats: the many lives of a publicly available medical imaging dataset
Copycats: the many lives of a publicly available medical imaging dataset
Amelia Jiménez-Sánchez
Natalia-Rozalia Avlona
Dovile Juodelyte
Théo Sourget
Caroline Vang-Larsen
Anna Rogers
Hubert Dariusz Zajkac
Veronika Cheplygina
113
3
0
09 Feb 2024
Quantifying and Enhancing Multi-modal Robustness with Modality
  Preference
Quantifying and Enhancing Multi-modal Robustness with Modality Preference
Zequn Yang
Yake Wei
Ce Liang
Di Hu
AAML
74
10
0
09 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRLVLMLM&Ro
116
54
0
08 Feb 2024
Question Aware Vision Transformer for Multimodal Reasoning
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz
Yair Kittenplon
Aviad Aberdam
Elad Ben Avraham
Oren Nuriel
Shai Mazor
Ron Litman
106
23
0
08 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
241
116
0
08 Feb 2024
SceMQA: A Scientific College Entrance Level Multimodal Question
  Answering Benchmark
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
Zhenwen Liang
Kehan Guo
Gang Liu
Taicheng Guo
Yujun Zhou
Tianyu Yang
Jiajun Jiao
Renjie Pi
Jipeng Zhang
Xiangliang Zhang
ELM
86
24
0
06 Feb 2024
Multimodal Rationales for Explainable Visual Question Answering
Multimodal Rationales for Explainable Visual Question Answering
Kun Li
G. Vosselman
Michael Ying Yang
132
2
0
06 Feb 2024
Text-Guided Image Clustering
Text-Guided Image Clustering
Andreas Stephan
Lukas Miklautz
Kevin Sidak
Jan Philip Wahle
Bela Gipp
Claudia Plant
Benjamin Roth
68
6
0
05 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual
  Question Answering
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
79
3
0
04 Feb 2024
Common Sense Reasoning for Deepfake Detection
Common Sense Reasoning for Deepfake Detection
Yue Zhang
Ben Colman
Xiao Guo
Ali Shahriyari
Gaurav Bharaj
143
35
0
31 Jan 2024
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models
  for Spatial Proximity Analysis
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
Jianing Li
Xi Nan
Ming Lu
Li Du
Shanghang Zhang
58
2
0
31 Jan 2024
MouSi: Poly-Visual-Expert Vision-Language Models
MouSi: Poly-Visual-Expert Vision-Language Models
Xiaoran Fan
Tao Ji
Changhao Jiang
Shuo Li
Senjie Jin
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yunchun Jiang
VLM
51
17
0
30 Jan 2024
Towards Unified Interactive Visual Grounding in The Wild
Towards Unified Interactive Visual Grounding in The Wild
Jie Xu
Hanbo Zhang
Qingyi Si
Yifeng Li
Xuguang Lan
Tao Kong
LM&Ro
66
5
0
30 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLMMLLM
159
268
0
29 Jan 2024
LCV2: An Efficient Pretraining-Free Framework for Grounded Visual
  Question Answering
LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering
Yuhan Chen
Lumei Su
Lihua Chen
Zhiwei Lin
MLLM
27
1
0
29 Jan 2024
Improving Data Augmentation for Robust Visual Question Answering with
  Effective Curriculum Learning
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
Yuhang Zheng
Zhen Wang
Long Chen
61
2
0
28 Jan 2024
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web
  Tasks
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh
Robert Lo
Lawrence Jang
Vikram Duvvur
Ming Chong Lim
Po-Yu Huang
Graham Neubig
Shuyan Zhou
Ruslan Salakhutdinov
Daniel Fried
135
0
0
24 Jan 2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
Siwei Wu
Yizhi Li
Kang Zhu
Ge Zhang
Yiming Liang
...
Wenhu Chen
Wenhao Huang
Noura Al Moubayed
Jie Fu
Chenghua Lin
98
13
0
24 Jan 2024
Common-Sense Bias Modeling for Classification Tasks
Common-Sense Bias Modeling for Classification Tasks
Miao Zhang
Zee fryer
Ben Colman
Ali Shahriyari
Gaurav Bharaj
95
0
0
24 Jan 2024
Collaborative Position Reasoning Network for Referring Image
  Segmentation
Collaborative Position Reasoning Network for Referring Image Segmentation
Jianjian Cao
Beiya Dai
Yulin Li
Xiameng Qin
Jingdong Wang
99
0
0
22 Jan 2024
LLMRA: Multi-modal Large Language Model based Restoration Assistant
LLMRA: Multi-modal Large Language Model based Restoration Assistant
Xiaoyu Jin
Yuan Shi
Bin Xia
Wenming Yang
100
4
0
21 Jan 2024
Prompting Large Vision-Language Models for Compositional Reasoning
Prompting Large Vision-Language Models for Compositional Reasoning
Timothy Ossowski
Ming Jiang
Junjie Hu
CoGeVLMLRM
102
3
0
20 Jan 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining
  Question-Answer Prompts for VQA requiring Diverse World Knowledge
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
108
4
0
19 Jan 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Hongsheng Li
Yu Qiao
Jifeng Dai
AuLLM
145
49
0
18 Jan 2024
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and
  Visual Question Generation
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Kohei Uehara
Nabarun Goswami
Hanqin Wang
Toshiaki Baba
Kohtaro Tanaka
...
Takagi Naoya
Ryo Umagami
Yingyi Wen
Tanachai Anakewat
Tatsuya Harada
LRM
74
3
0
18 Jan 2024
MMToM-QA: Multimodal Theory of Mind Question Answering
MMToM-QA: Multimodal Theory of Mind Question Answering
Chuanyang Jin
Yutong Wu
Jing Cao
Jiannan Xiang
Yen-Ling Kuo
Zhiting Hu
T. Ullman
Antonio Torralba
Joshua B. Tenenbaum
Tianmin Shu
111
46
0
16 Jan 2024
AesBench: An Expert Benchmark for Multimodal Large Language Models on
  Image Aesthetics Perception
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
Yipo Huang
Quan Yuan
Xiangfei Sheng
Zhichao Yang
Haoning Wu
Pengfei Chen
Yuzhe Yang
Leida Li
Weisi Lin
VLM
72
40
0
16 Jan 2024
Uncovering the Full Potential of Visual Grounding Methods in VQA
Uncovering the Full Potential of Visual Grounding Methods in VQA
Daniel Reich
Tanja Schultz
102
5
0
15 Jan 2024
Survey of Natural Language Processing for Education: Taxonomy,
  Systematic Review, and Future Trends
Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends
Yunshi Lan
Xinyuan Li
Hanyue Du
Xuesong Lu
Ming Gao
Weining Qian
Aoying Zhou
104
4
0
15 Jan 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang
Bohan Zhuang
Qi Wu
66
12
0
12 Jan 2024
Efficient Vision-and-Language Pre-training with Text-Relevant Image
  Patch Selection
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Mingshi Yan
Shikun Zhang
Songhang Huang
Fei Huang
VLM
79
0
0
11 Jan 2024
Cross-modal Retrieval for Knowledge-based Visual Question Answering
Cross-modal Retrieval for Knowledge-based Visual Question Answering
Paul Lerner
Olivier Ferret
C. Guinaudeau
85
9
0
11 Jan 2024
REBUS: A Robust Evaluation Benchmark of Understanding Symbols
REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Andrew Gritsevskiy
Arjun Panickssery
Aaron Kirtland
Derik Kauffman
Hans Gundlach
Irina Gritsevskaya
Joe Cavanagh
Jonathan Chiang
Lydia La Roux
Michelle Hung
ReLM
44
1
0
11 Jan 2024
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual
  Concept Understanding
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding
Yatong Bai
Utsav Garg
Apaar Shanker
Haoming Zhang
Samyak Parajuli
...
Eugenia D Fomitcheva
E. Branson
Aerin Kim
Somayeh Sojoudi
Kyunghyun Cho
56
2
0
09 Jan 2024
We Need to Talk About Classification Evaluation Metrics in NLP
We Need to Talk About Classification Evaluation Metrics in NLP
Peter Vickers
Loïc Barrault
Emilio Monti
Nikolaos Aletras
ELM
59
3
0
08 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
117
4
0
06 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via
  Text-Only Training
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
72
4
0
04 Jan 2024
Context-Guided Spatio-Temporal Video Grounding
Context-Guided Spatio-Temporal Video Grounding
Xin Gu
Hengrui Fan
Yan Huang
Tiejian Luo
Libo Zhang
100
16
0
03 Jan 2024
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Haopeng Li
Andong Deng
Qiuhong Ke
Jun Liu
Hossein Rahmani
Yulan Guo
Mohammed Bennamoun
Chen Chen
184
17
0
03 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLMMLLM
102
175
0
28 Dec 2023
MIVC: Multiple Instance Visual Component for Visual-Language Models
MIVC: Multiple Instance Visual Component for Visual-Language Models
Wenyi Wu
Qi Li
Leon Wenliang Zhong
Junzhou Huang
71
3
0
28 Dec 2023
Gemini Pro Defeated by GPT-4V: Evidence from Education
Gemini Pro Defeated by GPT-4V: Evidence from Education
Gyeong-Geon Lee
Ehsan Latif
Lehong Shi
Xiaoming Zhai
93
24
0
27 Dec 2023
Previous
123...131415...585960
Next