Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.08269
Cited By
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
11 April 2025
Qi Zhi Lim
C. Lee
K. Lim
Kalaiarasi Sonai Muthu Anbananthen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering"
30 / 30 papers shown
Title
Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering
Shuwen Yang
Anran Wu
Xingjiao Wu
Luwei Xiao
Tianlong Ma
Cheng Jin
Liang He
62
4
0
15 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
429
4,422
0
09 Jun 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
288
956
0
27 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
569
4,910
0
17 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
251
1,200
0
27 Mar 2023
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
64
37
0
12 Dec 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
218
3,150
0
20 Oct 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
154
880
0
07 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
555
4,413
0
28 Jan 2022
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
55
87
0
01 Sep 2021
BARTScore: Evaluating Generated Text as Text Generation
Weizhe Yuan
Graham Neubig
Pengfei Liu
126
849
0
22 Jun 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
490
10,496
0
17 Jun 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
292
2,841
0
15 Jun 2021
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance
Fengbin Zhu
Wenqiang Lei
Youcheng Huang
Chao Wang
Shuo Zhang
Jiancheng Lv
Fuli Feng
Tat-Seng Chua
AIMat
120
303
0
17 May 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
975
29,871
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
676
41,483
0
22 Oct 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
165
2,754
0
05 Jun 2020
HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
Wenhu Chen
Hanwen Zha
Zhiyu Zoey Chen
Wenhan Xiong
Hong Wang
Wenjie Wang
73
309
0
15 Apr 2020
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Darryl Hannan
Akshay Jain
Joey Tianyi Zhou
AAML
67
59
0
22 Jan 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
485
20,342
0
23 Oct 2019
OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction
Xu Han
Tianyu Gao
Yuan Yao
Deming Ye
Zhiyuan Liu
Maosong Sun
KELM
VLM
109
153
0
28 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
357
942
0
24 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,316
0
27 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
686
24,557
0
26 Jul 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
Semi-supervised sequence tagging with bidirectional language models
Matthew E. Peters
Bridger Waleed Ammar
Chandra Bhagavatula
Russell Power
97
635
0
29 Apr 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
352
3,270
0
02 Dec 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
316
8,174
0
16 Jun 2016
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
531
62,377
0
04 Jun 2015
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
226
5,509
0
03 May 2015
1