ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,890 papers shown
Title
Using Large Language Models for education managements in Vietnamese with low resources
Duc Do Minh
Vinh Nguyen Van
Thang Dam Cong
43
0
0
28 Jan 2025
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Hammad A. Ayyubi
Xuande Feng
Junzhang Liu
Xudong Lin
Zhecan Wang
Shih-Fu Chang
50
0
0
24 Jan 2025
Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering
Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering
Qian Tao
Xiaoyang Fan
Yong Xu
Xingquan Zhu
Yufei Tang
52
0
0
22 Jan 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
47
0
0
19 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
51
0
0
13 Jan 2025
Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation
Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation
Daowan Peng
Wei Wei
211
0
0
10 Jan 2025
MULTI: Multimodal Understanding Leaderboard with Text and Images
MULTI: Multimodal Understanding Leaderboard with Text and Images
Zichen Zhu
Yang Xu
Lu Chen
Jingkai Yang
Yichuan Ma
...
Yingzi Ma
Situo Zhang
Zihan Zhao
Liangtai Sun
Kai Yu
VLM
54
5
0
08 Jan 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
36
0
0
07 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
96
12
0
07 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
91
12
0
06 Jan 2025
Efficient Architectures for High Resolution Vision-Language Models
Miguel Carvalho
Bruno Martins
MLLM
VLM
50
0
0
05 Jan 2025
Accounting for Focus Ambiguity in Visual Questions
Chongyan Chen
Yu-Yun Tseng
Zhuoheng Li
Anush Venkatesh
Danna Gurari
46
0
0
04 Jan 2025
Instruction-Guided Scene Text Recognition
Instruction-Guided Scene Text Recognition
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
75
3
0
03 Jan 2025
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Lijie Tao
Han Zhang
Haizhao Jing
Yu Liu
Kelu Yao
Guoting Wei
Xizhe Xue
44
0
0
03 Jan 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
84
13
0
03 Jan 2025
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang
Ziqiao Ma
Jialu Li
Yanyuan Qiao
Zun Wang
J. Chai
Qi Wu
Joey Tianyi Zhou
Parisa Kordjamshidi
LRM
63
19
0
31 Dec 2024
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Dapeng Zhao
Yue Qi
3DH
CVBM
3DV
37
6
0
31 Dec 2024
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Palash Nandi
Shivam Sharma
Tanmoy Chakraborty
41
1
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
67
4
0
31 Dec 2024
Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Yitong Zhou
Mingyue Cheng
Qingyang Mao
Qi Liu
F. Xu
Xin Li
Enhong Chen
LMTD
49
0
0
30 Dec 2024
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question
  Answering and Object Localization
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
Tan-Hanh Pham
Hoang-Nam Le
Phu-Vinh Nguyen
Chris Ngo
Truong-Son Hy
AuLLM
LRM
96
1
0
21 Dec 2024
A Review of Multimodal Explainable Artificial Intelligence: Past,
  Present and Future
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
Jing Liu
N. Shah
Ping Chen
99
3
0
18 Dec 2024
What makes a good metric? Evaluating automatic metrics for text-to-image
  consistency
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross
Melissa Hall
Adriana Romero Soriano
Adina Williams
101
3
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
92
0
0
18 Dec 2024
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
S. Nagendra
Kashif Rashid
Chaopeng Shen
Daniel Kifer
VLM
79
2
0
16 Dec 2024
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Kun Ouyang
Yuanxin Liu
Shicheng Li
Yi Liu
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
110
1
0
16 Dec 2024
ViSymRe: Vision-guided Multimodal Symbolic Regression
ViSymRe: Vision-guided Multimodal Symbolic Regression
Da Li
Junping Yin
Jin Xu
Xinxin Li
Juan Zhang
87
1
0
15 Dec 2024
Seeing the Forest and the Trees: Solving Visual Graph and Tree Based
  Data Structure Problems using Large Multimodal Models
Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models
S. Gutierrez
Irene Hou
Jihye Lee
Kenneth Angelikas
Owen Man
Sophia Mettille
James Prather
Paul Denny
Stephen MacNeil
71
1
0
15 Dec 2024
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for
  Training-Free Zero-Shot Composed Image Retrieval
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang
Xiaoting Qin
Jingyang Zhang
Jing Yu
Gaopeng Gou
Gang Xiong
Qingwei Ling
Saravan Rajmohan
Dongmei Zhang
Qi Wu
LRM
68
1
0
15 Dec 2024
NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Tao Wu
Chuhao Zhou
Yen Heng Wong
Lin Gu
Jianfei Yang
89
1
0
14 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip Torr
VLM
ObjD
272
0
0
12 Dec 2024
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using
  Multi-Modal Models
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models
Sayak Chakrabarty
Souradip Pal
LRM
71
1
0
10 Dec 2024
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph
  Generation with Enhanced Spatial Relations
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations
Mingjie Xu
Mengyang Wu
Yuzhi Zhao
Jason Chun Lok Li
Weifeng Ou
LRM
SyDa
VLM
73
2
0
09 Dec 2024
An Entailment Tree Generation Approach for Multimodal Multi-Hop Question
  Answering with Mixture-of-Experts and Iterative Feedback Mechanism
An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism
Qing Zhang
Haocheng Lv
Jie Liu
Zheyu Chen
Jianyong Duan
Hao Wang
Li He
Mingying Xv
77
1
0
08 Dec 2024
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Aditya Chinchure
Sahithya Ravi
R. Ng
Vered Shwartz
Boyang Albert Li
Leonid Sigal
ReLM
LRM
VLM
82
2
0
07 Dec 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on
  Developmentally Plausible Corpora
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Ryan Cotterell
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
99
8
0
06 Dec 2024
MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language
  Models
MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language Models
Ming-Chang Chiu
Shicheng Wen
Pin-Yu Chen
Xuezhe Ma
82
0
0
05 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
106
5
0
05 Dec 2024
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Chancharik Mitra
Brandon Huang
Tianning Chai
Zhiqiu Lin
Assaf Arbelle
Rogerio Feris
Leonid Karlinsky
Trevor Darrell
Deva Ramanan
Roei Herzig
VLM
137
4
0
28 Nov 2024
Abductive Symbolic Solver on Abstraction and Reasoning Corpus
Abductive Symbolic Solver on Abstraction and Reasoning Corpus
Mintaek Lim
Seokki Lee
Liyew Woletemaryam Abitew
Sundong Kim
130
1
0
27 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object
  Interaction Analysis
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Donggoo Kang
Dasol Jeong
Hyunmin Lee
Sangwoo Park
Hasil Park
Sunkyu Kwon
Yeongjoon Kim
Joonki Paik
MLLM
VLM
79
0
0
27 Nov 2024
Evaluating Vision-Language Models as Evaluators in Path Planning
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal
Xiang Yue
Erion Plaku
Ziyu Yao
LRM
77
1
0
27 Nov 2024
CoA: Chain-of-Action for Generative Semantic Labels
CoA: Chain-of-Action for Generative Semantic Labels
Meng Wei
Zhongnian Li
Peng Ying
Xinzheng Xu
VLM
74
0
0
26 Nov 2024
Task Progressive Curriculum Learning for Robust Visual Question
  Answering
Task Progressive Curriculum Learning for Robust Visual Question Answering
Ahmed Akl
Abdelwahed Khamis
Zhe Wang
Ali Cheraghian
Sara Khalifa
Kewen Wang
OOD
83
0
0
26 Nov 2024
Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions
Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions
Nicolai Hermann
Jorge Condor
Piotr Didyk
3DV
92
0
0
26 Nov 2024
MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating
  Multimodal Large Language Models Understanding of Complex Image
MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
Shezheng Song
Chengxiang He
Shasha Li
Shan Zhao
Chengyu Wang
...
Xiaopeng Li
Qian Wan
Jun Ma
Jie Yu
Xiaoguang Mao
VLM
87
1
0
25 Nov 2024
ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image
  Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality
  Images
ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality Images
Prithviraj Purushottam Naik
Rohit Agarwal
VLM
CLIP
72
0
0
25 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
101
8
0
25 Nov 2024
ResCLIP: Residual Attention for Training-free Dense Vision-language
  Inference
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Yuhang Yang
Jinhong Deng
Wen Li
Lixin Duan
VLM
81
0
0
24 Nov 2024
Creating Scalable AGI: the Open General Intelligence Framework
Creating Scalable AGI: the Open General Intelligence Framework
Daniel A. Dollinger
Michael Singleton
AI4CE
71
0
0
24 Nov 2024
Previous
12345...565758
Next