ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.00491
  4. Cited By
A Corpus for Reasoning About Natural Language Grounded in Photographs
v1v2v3 (latest)

A Corpus for Reasoning About Natural Language Grounded in Photographs

1 November 2018
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
    LRM
ArXiv (abs)PDFHTML

Papers citing "A Corpus for Reasoning About Natural Language Grounded in Photographs"

50 / 419 papers shown
Title
RedCaps: web-curated image-text data created by the people, for the
  people
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
139
169
0
22 Nov 2021
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating
  Visio-Linguistic Reasoning
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
Keng Ji Chow
Samson Tan
MingSung Kan
LRM
74
4
0
21 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
83
57
0
19 Nov 2021
Achieving Human Parity on Visual Question Answering
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
63
13
0
17 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual
  Concepts
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLMCLIP
148
308
0
16 Nov 2021
Explainable Semantic Space by Grounding Language to Vision with
  Cross-Modal Contrastive Learning
Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning
Yizhen Zhang
Minkyu Choi
Kuan Han
Zhongming Liu
VLM
79
17
0
13 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language
  Transformers
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
108
381
0
03 Nov 2021
VLMo: Unified Vision-Language Pre-Training with
  Mixture-of-Modality-Experts
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLMMLLMMoE
106
560
0
03 Nov 2021
Semantically Distributed Robust Optimization for Vision-and-Language
  Inference
Semantically Distributed Robust Optimization for Vision-and-Language Inference
Tejas Gokhale
A. Chaudhary
Pratyay Banerjee
Chitta Baral
Yezhou Yang
131
17
0
14 Oct 2021
Visually Grounded Reasoning across Languages and Cultures
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
Edoardo Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLMLRM
175
180
0
28 Sep 2021
Dense Contrastive Visual-Linguistic Pretraining
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLMSSL
129
11
0
24 Sep 2021
Transferring Knowledge from Vision to Language: How to Achieve it and
  how to Measure it?
Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?
Tobias Norlund
Lovisa Hagström
Richard Johansson
92
25
0
23 Sep 2021
Caption Enriched Samples for Improving Hateful Memes Detection
Caption Enriched Samples for Improving Hateful Memes Detection
Efrat Blaier
Itzik Malkiel
Lior Wolf
VLM
96
24
0
22 Sep 2021
COVR: A test-bed for Visually Grounded Compositional Generalization with
  real images
COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Ben Bogin
Shivanshu Gupta
Matt Gardner
Jonathan Berant
CoGe
105
29
0
22 Sep 2021
What Vision-Language Models `See' when they See Scenes
What Vision-Language Models `See' when they See Scenes
Michele Cafagna
Kees van Deemter
Albert Gatt
VLM
112
13
0
15 Sep 2021
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Da Yin
Liunian Harold Li
Ziniu Hu
Nanyun Peng
Kai-Wei Chang
172
56
0
14 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
124
62
0
13 Sep 2021
Panoptic Narrative Grounding
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
148
23
0
10 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial
  Multi-modal Pretraining
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
118
38
0
09 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal
  Abstractive Summarization
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
116
74
0
06 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
90
20
0
05 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question
  Answering
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
87
19
0
04 Sep 2021
WebQA: Multihop and Multimodal QA
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
93
87
0
01 Sep 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
199
801
0
24 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
62
10
0
21 Aug 2021
CIGLI: Conditional Image Generation from Language & Image
CIGLI: Conditional Image Generation from Language & Image
Xiaopeng Lu
Lynnette Hui Xian Ng
Jared Fernandez
Hao Zhu
DiffM
58
6
0
20 Aug 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
  Models
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
86
207
0
09 Aug 2021
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering
  and Reading Comprehension
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
Anna Rogers
Matt Gardner
Isabelle Augenstein
146
170
0
27 Jul 2021
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for
  Video-and-Language Inference
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference
Juncheng Li
Siliang Tang
Linchao Zhu
Haochen Shi
Xuanwen Huang
Leilei Gan
Yi Yang
Yueting Zhuang
117
28
0
26 Jul 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Tongtong Liu
Fangxiang Feng
Xiaojie Wang
57
13
0
22 Jul 2021
Neural Abstructions: Abstractions that Support Construction for Grounded
  Language Learning
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
61
0
0
20 Jul 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
444
1,991
0
16 Jul 2021
Mind Your Outliers! Investigating the Negative Impact of Outliers on
  Active Learning for Visual Question Answering
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti
Ranjay Krishna
Li Fei-Fei
Christopher D. Manning
108
92
0
06 Jul 2021
Building a Video-and-Language Dataset with Human Actions for Multimodal
  Logical Inference
Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
Riko Suzuki
Hitomi Yanaka
K. Mineshima
D. Bekki
VGenMLLM
58
1
0
27 Jun 2021
Probing Inter-modality: Visual Parsing with Self-Attention for
  Vision-Language Pre-training
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
122
89
0
25 Jun 2021
Grounding 'Grounding' in NLP
Grounding 'Grounding' in NLP
Khyathi Chandu
Yonatan Bisk
A. Black
103
54
0
04 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual
  Learning
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
139
119
0
03 Jun 2021
Volta at SemEval-2021 Task 9: Statement Verification and Evidence
  Finding with Tables using TAPAS and Transfer Learning
Volta at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables using TAPAS and Transfer Learning
Devansh Gautam
Kshitij Gupta
Manish Shrivastava
LMTD
58
6
0
01 Jun 2021
Premise-based Multimodal Reasoning: Conditional Inference on Joint
  Textual and Visual Clues
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues
Qingxiu Dong
Ziwei Qin
Heming Xia
Tian Feng
Shoujie Tong
...
Weidong Zhan
Sujian Li
Zhongyu Wei
Tianyu Liu
Zuifang Sui
LRM
66
11
0
15 May 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
156
56
0
23 Apr 2021
Sattiy at SemEval-2021 Task 9: An Ensemble Solution for Statement
  Verification and Evidence Finding with Tables
Sattiy at SemEval-2021 Task 9: An Ensemble Solution for Statement Verification and Evidence Finding with Tables
Xiaoyi Ruan
Meizhi Jin
Jian Ma
Haiqing Yang
Lian-Xin Jiang
Yang Mo
Mengyuan Zhou
LMTD
77
2
0
21 Apr 2021
Constrained Language Models Yield Few-Shot Semantic Parsers
Constrained Language Models Yield Few-Shot Semantic Parsers
Richard Shin
C. H. Lin
Sam Thomson
Charles C. Chen
Subhro Roy
Emmanouil Antonios Platanios
Adam Pauls
Dan Klein
J. Eisner
Benjamin Van Durme
408
206
0
18 Apr 2021
Question Decomposition with Dependency Graphs
Question Decomposition with Dependency Graphs
Matan Hasson
Jonathan Berant
GNN
100
10
0
17 Apr 2021
Effect of Visual Extensions on Natural Language Understanding in
  Vision-and-Language Models
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Taichi Iki
Akiko Aizawa
VLM
80
20
0
16 Apr 2021
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
Roshanak Mirzaee
Hossein Rajaby Faghihi
Qiang Ning
Parisa Kordjmashidi
63
83
0
12 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language
  Representation Learning
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLMViT
199
274
0
07 Apr 2021
FixMyPose: Pose Correctional Captioning and Retrieval
FixMyPose: Pose Correctional Captioning and Retrieval
Hyounghun Kim
Abhaysinh Zala
Graham Burri
Joey Tianyi Zhou
74
16
0
04 Apr 2021
TAPAS at SemEval-2021 Task 9: Reasoning over tables with intermediate
  pre-training
TAPAS at SemEval-2021 Task 9: Reasoning over tables with intermediate pre-training
Thomas Müller
Julian Martin Eisenschlos
Syrine Krichene
LMTD
100
15
0
02 Apr 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
105
121
0
30 Mar 2021
SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple
  Levels
SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Chenliang Li
Ming Yan
Haiyang Xu
Fuli Luo
Wei Wang
Bin Bi
Songfang Huang
VLM
79
36
0
14 Mar 2021
Previous
123456789
Next