ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang
Jiaming Liu
Chenxuan Li
Junpeng Ma
Yuan Zhang
...
Kevin Zhang
Maurice Chong
Ray Zhang
Yijiang Liu
Shanghang Zhang
109
8
0
26 Dec 2023
ChartBench: A Benchmark for Complex Visual Reasoning in Charts
ChartBench: A Benchmark for Complex Visual Reasoning in Charts
Zhengzhuo Xu
Sinan Du
Yiyan Qi
Chengjin Xu
Chun Yuan
Jian Guo
155
49
0
26 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
106
18
0
25 Dec 2023
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
Xinyan Chen
Jiaxin Ge
Tianjun Zhang
Jiaming Liu
Shanghang Zhang
VLMEGVM
193
0
0
23 Dec 2023
Multimodal Attention Merging for Improved Speech Recognition and Audio
  Event Classification
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Anirudh S. Sundar
Chao-Han Huck Yang
David M. Chan
Shalini Ghosh
Venkatesh Ravichandran
P. S. Nidadavolu
MoMe
103
9
0
22 Dec 2023
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR
  Understanding
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang
Jiaming Liu
Ray Zhang
Mingjie Pan
Zoey Guo
Xiaoqi Li
Zehui Chen
Peng Gao
Yandong Guo
Shanghang Zhang
3DV
108
71
0
21 Dec 2023
LLM4VG: Large Language Models Evaluation for Video Grounding
LLM4VG: Large Language Models Evaluation for Video Grounding
Wei Feng
Xin Wang
Hong Chen
Zeyang Zhang
Zihan Song
Yuwei Zhou
Wenwu Zhu
107
8
0
21 Dec 2023
Towards More Faithful Natural Language Explanation Using Multi-Level
  Contrastive Learning in VQA
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai
Shengli Song
Shiqi Meng
Jingyang Li
Sitong Yan
Guangneng Hu
60
5
0
21 Dec 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large
  Multimodal and Language Models
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
64
1
0
21 Dec 2023
Cross-Modal Reasoning with Event Correlation for Video Question
  Answering
Cross-Modal Reasoning with Event Correlation for Video Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Qinru Qiu
Jian Tang
60
0
0
20 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLMMLLM
167
36
0
19 Dec 2023
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote
  Sensing Visual Question Answering
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Junjue Wang
Zhuo Zheng
Zihang Chen
A. Ma
Yanfei Zhong
54
24
0
19 Dec 2023
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM
  Finetuning
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao
Haoqin Tu
Chen Wei
Jieru Mei
Cihang Xie
114
36
0
18 Dec 2023
Benchmarks for Physical Reasoning AI
Benchmarks for Physical Reasoning AI
Andrew Melnik
Robin Schiewer
Moritz Lange
Andrei Muresanu
Mozhgan Saeidi
Animesh Garg
Helge J. Ritter
105
9
0
17 Dec 2023
StarVector: Generating Scalable Vector Graphics Code from Images and Text
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan A. Rodriguez
Shubham Agarwal
I. Laradji
Pau Rodríguez
P. Rodríguez
Sai Rajeswar
David Vazquez
Christopher Pal
M. Pedersoli
102
4
0
17 Dec 2023
Advancing Surgical VQA with Scene Graph Knowledge
Advancing Surgical VQA with Scene Graph Knowledge
Kun Yuan
Manasi Kattel
Joël L. Lavanchy
Nassir Navab
V. Srivastav
N. Padoy
124
21
0
15 Dec 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with
  Language Models
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun
Kim Sung-Bin
Seungju Han
Youngjae Yu
Tae-Hyun Oh
100
15
0
15 Dec 2023
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Sunjae Yoon
Dahyun Kim
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
C. Yoo
95
6
0
15 Dec 2023
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
Zhongyi Zhou
Jing Jin
Vrushank Phadnis
Xiuxiu Yuan
Jun Jiang
...
A. Olwal
David Kim
Ram Iyengar
Na Li
Andrea Colaço
63
5
0
15 Dec 2023
Assessing GPT4-V on Structured Reasoning Tasks
Assessing GPT4-V on Structured Reasoning Tasks
Mukul Singh
J. Cambronero
Sumit Gulwani
Vu Le
Gust Verbruggen
LRM
73
13
0
13 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
73
15
0
13 Dec 2023
Image Content Generation with Causal Reasoning
Image Content Generation with Causal Reasoning
Xiaochuan Li
Baoyu Fan
Runze Zhang
Liang Jin
Di Wang
Zhenhua Guo
Yaqian Zhao
Rengang Li
LRM
119
6
0
12 Dec 2023
Vision-language Assisted Attribute Learning
Vision-language Assisted Attribute Learning
Kongming Liang
Xinran Wang
Rui Wang
Donghui Gao
Ling Jin
Weidong Liu
Xiatian Zhu
Zhanyu Ma
Jun Guo
VLM
71
0
0
12 Dec 2023
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous
  Driving Datasets using Markup Annotations
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations
Yuichi Inoue
Yuki Yada
Kotaro Tanahashi
Yu Yamaguchi
71
23
0
11 Dec 2023
Multimodality of AI for Education: Towards Artificial General
  Intelligence
Multimodality of AI for Education: Towards Artificial General Intelligence
Gyeong-Geon Lee
Lehong Shi
Ehsan Latif
Yizhu Gao
Arne Bewersdorff
...
Zheng Liu
Hui Wang
Gengchen Mai
Tiaming Liu
Xiaoming Zhai
110
43
0
10 Dec 2023
Towards Knowledge-driven Autonomous Driving
Towards Knowledge-driven Autonomous Driving
Xin Li
Yeqi Bai
Pinlong Cai
Licheng Wen
Daocheng Fu
...
Yikang Li
Botian Shi
Yong-Jin Liu
Liang He
Yu Qiao
115
29
0
07 Dec 2023
VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal
  Models
VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models
Zongjie Li
Chaozheng Wang
Chaowei Liu
Pingchuan Ma
Daoyuan Wu
Shuai Wang
Cuiyun Gao
VLM
79
6
0
07 Dec 2023
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal
  Models
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai
Zirui Song
Dayan Guan
Zhenhao Chen
Xing Luo
Chenyu Yi
Alex C. Kot
MLLMVLM
103
35
0
05 Dec 2023
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video
  Grounding with Multimodal Large Language Model
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
Guozhang Li
Xinpeng Ding
De Cheng
Jie Li
Nannan Wang
Xinbo Gao
102
1
0
05 Dec 2023
Fine-tuning pre-trained extractive QA models for clinical document
  parsing
Fine-tuning pre-trained extractive QA models for clinical document parsing
Ashwyn Sharma
David I. Feldman
Aneesh Jain
94
0
0
04 Dec 2023
Recursive Visual Programming
Recursive Visual Programming
Jiaxin Ge
Sanjay Subramanian
Baifeng Shi
Roei Herzig
Trevor Darrell
46
7
0
04 Dec 2023
Good Questions Help Zero-Shot Image Reasoning
Good Questions Help Zero-Shot Image Reasoning
Kaiwen Yang
Tao Shen
Xinmei Tian
Xiubo Geng
Chongyang Tao
Dacheng Tao
Dinesh Manocha
LRM
100
7
0
04 Dec 2023
A Challenging Multimodal Video Summary: Simultaneously Extracting and
  Generating Keyframe-Caption Pairs from Video
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
Keito Kudo
Haruki Nagasawa
Jun Suzuki
Nobuyuki Shimizu
75
2
0
04 Dec 2023
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative
  Models
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Chinchure
Pushkar Shukla
Gaurav Bhatt
Kiri Salij
K. Hosanagar
Leonid Sigal
Matthew Turk
92
29
0
03 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLMVLM
85
11
0
03 Dec 2023
Understanding Unimodal Bias in Multimodal Deep Linear Networks
Understanding Unimodal Bias in Multimodal Deep Linear Networks
Yedi Zhang
Peter E. Latham
Andrew Saxe
80
6
0
01 Dec 2023
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion
  Models
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Zhen Xing
Qi Dai
Zihao Zhang
Hui Zhang
Hang-Rui Hu
Zuxuan Wu
Yu-Gang Jiang
VGen
102
17
0
30 Nov 2023
MLLMs-Augmented Visual-Language Representation Learning
MLLMs-Augmented Visual-Language Representation Learning
Yanqing Liu
Kai Wang
Wenqi Shao
Ping Luo
Yu Qiao
Mike Zheng Shou
Kaipeng Zhang
Yang You
VLM
96
12
0
30 Nov 2023
Understanding and Improving In-Context Learning on Vision-language
  Models
Understanding and Improving In-Context Learning on Vision-language Models
Shuo Chen
Zhen Han
Bailan He
Mark Buckley
Philip Torr
Volker Tresp
Jindong Gu
80
7
0
29 Nov 2023
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic
  Vision-Language Planning
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
Yingdong Hu
Fanqi Lin
Tong Zhang
Li Yi
Yang Gao
LM&Ro
174
124
0
29 Nov 2023
Debiasing Multimodal Models via Causal Information Minimization
Debiasing Multimodal Models via Causal Information Minimization
Vaidehi Patil
A. Maharana
Mohit Bansal
CML
93
2
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
147
23
0
28 Nov 2023
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Chancharik Mitra
Brandon Huang
Trevor Darrell
Roei Herzig
MLLMLRM
113
98
0
27 Nov 2023
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for
  Vision LLMs
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu
Chenhang Cui
Zijun Wang
Yiyang Zhou
Bingchen Zhao
Junlin Han
Wangchunshu Zhou
Huaxiu Yao
Cihang Xie
MLLM
128
82
0
27 Nov 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLMELMVLM
377
960
0
27 Nov 2023
C-SAW: Self-Supervised Prompt Learning for Image Generalization in
  Remote Sensing
C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing
Avigyan Bhattacharya
Mainak Singha
Ankit Jha
Biplab Banerjee
SSLVLM
85
6
0
27 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of
  Vision-Language Models
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoVLRM
117
20
0
27 Nov 2023
Fully Authentic Visual Question Answering Dataset from Online
  Communities
Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen
Mengchen Liu
Noel Codella
Yunsheng Li
Lu Yuan
Danna Gurari
116
5
0
27 Nov 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large
  Language Models
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
114
17
0
24 Nov 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger
  Models with Self-Consistency Training
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan
Jingxuan Wei
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Ruifeng Guo
Xihong Yang
Stan Z. Li
LRM
100
10
0
23 Nov 2023
Previous
123...141516...585960
Next