ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.05401
  4. Cited By
MaXM: Towards Multilingual Visual Question Answering
v1v2v3 (latest)

MaXM: Towards Multilingual Visual Question Answering

12 September 2022
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
ArXiv (abs)PDFHTML

Papers citing "MaXM: Towards Multilingual Visual Question Answering"

41 / 41 papers shown
Title
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLMVLM
122
736
0
14 Sep 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
74
554
0
03 Jun 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
137
560
0
27 May 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
161
78
0
25 May 2022
All You May Need for VQA are Image Captions
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
73
76
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,607
0
29 Apr 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLMObjD
157
880
0
07 Feb 2022
RedCaps: web-curated image-text data created by the people, for the
  people
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
124
169
0
22 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLMMLLMCLIP
243
1,444
0
03 Nov 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
355
1,709
0
15 Oct 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
136
799
0
24 Aug 2021
Human-Adversarial Visual Question Answering
Human-Adversarial Visual Question Answering
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OODEgoVAAML
38
62
0
04 Jun 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA
  Models
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li
Jie Lei
Zhe Gan
Jingjing Liu
AAMLVLM
75
74
0
01 Jun 2021
$Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues
  via Question Generation and Question Answering
Q2Q^{2}Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Or Honovich
Leshem Choshen
Roee Aharoni
Ella Neeman
Idan Szpektor
Omri Abend
HILM
80
141
0
16 Apr 2021
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Julia Kreutzer
Isaac Caswell
Lisa Wang
Ahsan Wahab
D. Esch
...
Duygu Ataman
Orevaoghene Ahia
Oghenefego Ahia
Sweta Agrawal
Mofetoluwa Adeyemi
58
278
0
22 Mar 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual
  Machine Learning
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
272
320
0
02 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
450
1,142
0
17 Feb 2021
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense
  Spatiotemporal Grounding
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
Alexander Ku
Peter Anderson
Roma Patel
Eugene Ie
Jason Baldridge
93
315
0
15 Oct 2020
Asking and Answering Questions to Evaluate the Factual Consistency of
  Summaries
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
86
483
0
08 Apr 2020
Connecting Vision and Language with Localized Narratives
Connecting Vision and Language with Localized Narratives
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
107
252
0
06 Dec 2019
Synthetic QA Corpora Generation with Roundtrip Consistency
Synthetic QA Corpora Generation with Roundtrip Consistency
Chris Alberti
D. Andor
Emily Pitler
Jacob Devlin
Michael Collins
SyDa
75
249
0
12 Jun 2019
Scene Text Visual Question Answering
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
111
360
0
31 May 2019
Towards VQA Models That Can Read
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
111
1,255
0
18 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
101
556
0
06 Apr 2019
nocaps: novel object captioning at scale
nocaps: novel object captioning at scale
Harsh Agrawal
Karan Desai
Yufei Wang
Xinlei Chen
Rishabh Jain
Mark Johnson
Dhruv Batra
Devi Parikh
Stefan Lee
Peter Anderson
VLM
134
486
0
20 Dec 2018
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRMBDLOCLReLM
183
883
0
27 Nov 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALMELM
292
2,854
0
11 Jun 2018
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
Xirong Li
Chaoxi Xu
Xiaoxu Wang
Weiyu Lan
Zhengxiong Jia
Gang Yang
Jieping Xu
127
153
0
22 May 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
84
1,052
0
11 Apr 2018
VizWiz Grand Challenge: Answering Visual Questions from Blind People
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
116
862
0
22 Feb 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual
  Question Answering
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
155
586
0
01 Dec 2017
STAIR Captions: Constructing a Large-Scale Japanese Image Caption
  Dataset
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
Yuya Yoshikawa
Yutaro Shigeto
A. Takeuchi
3DV
66
118
0
02 May 2017
An Analysis of Visual Question Answering Algorithms
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
85
234
0
28 Mar 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
319
2,391
0
20 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
355
3,273
0
02 Dec 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
316
8,174
0
16 Jun 2016
Exploring Models and Data for Image Question Answering
Exploring Models and Data for Image Question Answering
Mengye Ren
Ryan Kiros
R. Zemel
80
719
0
08 May 2015
VQA: Visual Question Answering
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
233
5,509
0
03 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
224
2,496
0
01 Apr 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
152
5,591
0
07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
306
4,511
0
20 Nov 2014
1