ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked
  Auto-Encoder
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
76
3
0
25 Oct 2023
Knowledge Editing for Large Language Models: A Survey
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Wenlin Yao
KELM
176
163
0
24 Oct 2023
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal
  Consistency and Correlation Debiasing
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
Anant Khandelwal
130
2
0
24 Oct 2023
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought
  Language Prompting
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting
Lei Li
115
24
0
24 Oct 2023
Emergent Communication in Interactive Sketch Question Answering
Emergent Communication in Interactive Sketch Question Answering
Zixing Lei
Yiming Zhang
Yuxin Xiong
Siheng Chen
83
2
0
24 Oct 2023
LXMERT Model Compression for Visual Question Answering
LXMERT Model Compression for Visual Question Answering
Maryam Hashemi
Ghazaleh Mahmoudi
Sara Kodeiri
Hadi Sheikhi
Sauleh Eetemadi
VLM
34
4
0
23 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLMLRM
130
58
0
23 Oct 2023
Large Language Models can Share Images, Too!
Large Language Models can Share Images, Too!
Young-Jun Lee
Dokyong Lee
Joo Won Sung
Jonghwan Hyeon
Ho-Jin Choi
MLLM
84
2
0
23 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
  Beyond
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
107
3
0
23 Oct 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language
  Hallucination and Visual Illusion in Large Vision-Language Models
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLMMLLM
165
196
0
23 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
25
0
0
22 Oct 2023
Benchmarking Sequential Visual Input Reasoning and Prediction in
  Multimodal Large Language Models
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Mingwei Zhu
Leigang Sha
Yu Shu
Kangjia Zhao
Tiancheng Zhao
Yuxiang Cai
LRM
96
1
0
20 Oct 2023
On the Language Encoder of Contrastive Cross-modal Models
On the Language Encoder of Contrastive Cross-modal Models
Mengjie Zhao
Junya Ono
Zhi-Wei Zhong
Chieh-Hsin Lai
Yuhta Takida
Naoki Murata
Wei-Hsiang Liao
Takashi Shibuya
Hiromi Wakaki
Yuki Mitsufuji
VLM
63
0
0
20 Oct 2023
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question
  Answering
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Yuduo Wang
Pedram Ghamisi
63
6
0
19 Oct 2023
Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning
  for Versatile Multimodal Modeling
Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling
Yaqing Wang
Jialin Wu
T. Dabral
Jiageng Zhang
Geoff Brown
...
Frederick Liu
Yi Liang
Bo Pang
Michael Bendersky
Radu Soricut
VLM
81
15
0
18 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of
  Multi-modal Large Models
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
87
7
0
17 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
68
2
0
15 Oct 2023
Progressive Evidence Refinement for Open-domain Multimodal Retrieval
  Question Answering
Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering
Shuwen Yang
Anran Wu
Xingjiao Wu
Luwei Xiao
Tianlong Ma
Cheng Jin
Liang He
69
4
0
15 Oct 2023
VizAbility: Enhancing Chart Accessibility with LLM-based Conversational
  Interaction
VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction
Joshua Gorniak
Yoon Kim
Donglai Wei
Nam Wook Kim
82
10
0
14 Oct 2023
Vision-by-Language for Training-Free Compositional Image Retrieval
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
CoGe
115
61
0
13 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language
  Models
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLMVLM
81
39
0
13 Oct 2023
Can We Edit Multimodal Large Language Models?
Can We Edit Multimodal Large Language Models?
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
117
29
0
12 Oct 2023
Visual Question Generation in Bengali
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
74
1
0
12 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
81
0
0
12 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
H. Chen
Yue Jiang
Yingya Zhang
Jing Dong
Caifeng Shan
83
2
0
12 Oct 2023
CLIP for Lightweight Semantic Segmentation
CLIP for Lightweight Semantic Segmentation
Ke Jin
Wankou Yang
VLM
91
1
0
11 Oct 2023
Improving mitosis detection on histopathology images using large
  vision-language models
Improving mitosis detection on histopathology images using large vision-language models
Ruiwen Ding
James Hall
Neil Tenenholtz
Kristen Severson
VLM
74
6
0
11 Oct 2023
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jieting Long
Zewei Shi
Penghao Jiang
Yidong Gan
53
0
0
11 Oct 2023
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of
  Multi-modal Language Models
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang
Xiaotong Zhai
Zhongkai Zhao
Yongshuo Zong
Xin Wen
Bingchen Zhao
LRM
38
0
0
10 Oct 2023
What Makes for Robust Multi-Modal Models in the Face of Missing
  Modalities?
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
Siting Li
Chenzhuang Du
Yue Zhao
Yu Huang
Hang Zhao
59
4
0
10 Oct 2023
Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity
  Alignment
Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity Alignment
Qian Li
Cheng Ji
Shu Guo
Zhaoji Liang
Lihong Wang
Jianxin Li
68
11
0
10 Oct 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for
  Unbiased Question-Answering
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
Xiulong Liu
Zhikang Dong
Peng Zhang
76
24
0
10 Oct 2023
ViCor: Bridging Visual Understanding and Commonsense Reasoning with
  Large Language Models
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
KAI-QING Zhou
Kwonjoon Lee
Teruhisa Misu
Xin Eric Wang
LRM
102
4
0
09 Oct 2023
Rephrase, Augment, Reason: Visual Grounding of Questions for
  Vision-Language Models
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
ReLMLRM
76
8
0
09 Oct 2023
Foundation Models Meet Visualizations: Challenges and Opportunities
Foundation Models Meet Visualizations: Challenges and Opportunities
Weikai Yang
Mengchen Liu
Zheng Wang
Shixia Liu
87
40
0
09 Oct 2023
Causal Reasoning through Two Layers of Cognition for Improving
  Generalization in Visual Question Answering
Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering
Trang Nguyen
Naoaki Okazaki
LRM
82
0
0
09 Oct 2023
Negative Object Presence Evaluation (NOPE) to Measure Object
  Hallucination in Vision-Language Models
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models
Holy Lovenia
Wenliang Dai
Samuel Cahyawijaya
Ziwei Ji
Pascale Fung
MLLM
107
53
0
09 Oct 2023
Lightweight In-Context Tuning for Multimodal Unified Models
Lightweight In-Context Tuning for Multimodal Unified Models
Yixin Chen
Shuai Zhang
Boran Han
Jiaya Jia
65
2
0
08 Oct 2023
Analyzing Zero-Shot Abilities of Vision-Language Models on Video
  Understanding Tasks
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
VLM
74
0
0
07 Oct 2023
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
Korawat Charoenpitaks
Van-Quang Nguyen
Masanori Suganuma
Masahiro Takahashi
Ryoma Niihara
Takayuki Okatani
104
1
0
07 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
  Pre-trained Models
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAMLVLMCoGe
89
44
0
07 Oct 2023
Improving Automatic VQA Evaluation Using Large Language Models
Improving Automatic VQA Evaluation Using Large Language Models
Oscar Manas
Benno Krojer
Aishwarya Agrawal
95
25
0
04 Oct 2023
On the Cognition of Visual Question Answering Models and Human
  Intelligence: A Comparative Study
On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study
Liben Chen
Long Chen
Tian Ellison-Chen
Zhuoyuan Xu
LRM
36
0
0
04 Oct 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRMMLLM
167
669
0
03 Oct 2023
Constructing Image-Text Pair Dataset from Books
Constructing Image-Text Pair Dataset from Books
Yamato Okamoto
Haruto Toyonaga
Yoshihisa Ijiri
Hirokatsu Kataoka
79
3
0
03 Oct 2023
HallE-Control: Controlling Object Hallucination in Large Multimodal
  Models
HallE-Control: Controlling Object Hallucination in Large Multimodal Models
Bohan Zhai
Shijia Yang
Chenfeng Xu
Sheng Shen
Kurt Keutzer
Chunyuan Li
Manling Li
MLLM
105
14
0
03 Oct 2023
Human Mobility Question Answering (Vision Paper)
Human Mobility Question Answering (Vision Paper)
Hao Xue
Flora D. Salim
44
0
0
02 Oct 2023
Application of frozen large-scale models to multimodal task-oriented
  dialogue
Application of frozen large-scale models to multimodal task-oriented dialogue
Tatsuki Kawamoto
Takuma Suzuki
Ko Miyama
Takumi Meguro
Tomohiro Takagi
61
1
0
02 Oct 2023
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
  Toolsets
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Lifan Yuan
Yangyi Chen
Xingyao Wang
Yi R. Fung
Hao Peng
Heng Ji
LLMAGKELM
128
69
0
29 Sep 2023
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Towards Complex-query Referring Image Segmentation: A Novel Benchmark
Wei Ji
Li Li
Marco Pleines
Xiangyan Liu
Xu Yang
Juncheng Billy Li
Roger Zimmermann
68
8
0
29 Sep 2023
Previous
123...161718...585960
Next