ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.04735
  4. Cited By
SnapNTell: Enhancing Entity-Centric Visual Question Answering with
  Retrieval Augmented Multimodal LLM

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

7 March 2024
Jielin Qiu
Andrea Madotto
Zhaojiang Lin
Paul A. Crook
Yongjun Xu
Xin Luna Dong
Christos Faloutsos
Lei Li
Babak Damavandi
Seungwhan Moon
ArXiv (abs)PDFHTML

Papers citing "SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM"

19 / 19 papers shown
Title
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A.
  Will LLMs Replace Knowledge Graphs?
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
Kai Sun
Yongjun Xu
Hanwen Zha
Yue Liu
Xinhsuai Dong
AI4MH
100
148
0
20 Aug 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
136
2,095
0
11 May 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot
  Image Captioning
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
Mohammad Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLMRALM
75
40
0
09 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and Outputs
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
98
122
0
31 Jan 2023
QUILL: Query Intent with Large Language Models using Retrieval
  Augmentation and Multi-stage Distillation
QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation
Krishna Srinivasan
K. Raman
Anupam Samanta
Ling-Yen Liao
L. Bertelli
Michael Bendersky
RALMLRM
58
20
0
27 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
200
3,493
0
16 Oct 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
69
551
0
03 Jun 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
522
6,293
0
05 Apr 2022
KAT: A Knowledge Augmented Transformer for Vision-and-Language
KAT: A Knowledge Augmented Transformer for Vision-and-Language
Liangke Gui
Borui Wang
Qiuyuan Huang
Alexander G. Hauptmann
Yonatan Bisk
Jianfeng Gao
68
159
0
16 Dec 2021
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjDVLM
129
1,067
0
07 Dec 2021
Learning Compact Metrics for MT
Learning Compact Metrics for MT
Amy Pu
Hyung Won Chung
Ankur P. Parikh
Sebastian Gehrmann
Thibault Sellam
72
101
0
12 Oct 2021
WebQA: Multihop and Multimodal QA
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
53
87
0
01 Sep 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
183
788
0
25 Jun 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual
  Machine Learning
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
268
320
0
02 Mar 2021
REALM: Retrieval-Augmented Language Model Pre-Training
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
142
2,116
0
10 Feb 2020
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Darryl Hannan
Akshay Jain
Joey Tianyi Zhou
AAML
62
59
0
22 Jan 2020
Billion-scale similarity search with GPUs
Billion-scale similarity search with GPUs
Jeff Johnson
Matthijs Douze
Hervé Jégou
257
3,737
0
28 Feb 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
350
3,270
0
02 Dec 2016
FVQA: Fact-based Visual Question Answering
FVQA: Fact-based Visual Question Answering
Peng Wang
Qi Wu
Chunhua Shen
Anton van den Hengel
A. Dick
CoGe
87
462
0
17 Jun 2016
1