Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.09358
Cited By
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
22 July 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"
34 / 34 papers shown
Title
VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models
Harshit
Tolga Tasdizen
CoGe
VLM
28
1
0
06 Oct 2024
CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting
Runze Chen
Mingyu Xiao
Haiyong Luo
Fang Zhao
Fan Wu
Hao Xiong
Qi Liu
Meng Song
3DGS
39
0
0
13 Sep 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
36
19
0
30 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
89
6
0
03 Mar 2024
Rethinking the Evaluating Framework for Natural Language Understanding in AI Systems: Language Acquisition as a Core for Future Metrics
Patricio Vera
Pedro Moya
Lisa Barraza
ELM
18
1
0
21 Sep 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
34
90
0
14 May 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
26
1
0
05 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
29
4
0
04 Mar 2023
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
38
0
0
23 Aug 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
30
2
0
02 Jul 2022
2D Human Pose Estimation: A Survey
Haoming Chen
Runyang Feng
Sifan Wu
Hao Xu
F. Zhou
Zhenguang Liu
3DH
25
55
0
15 Apr 2022
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
24
63
0
15 Apr 2022
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
28
179
0
18 Feb 2022
A Survey of Natural Language Generation
Chenhe Dong
Yinghui Li
Haifan Gong
M. Chen
Junxin Li
Ying Shen
Min Yang
3DV
24
43
0
22 Dec 2021
Levels of explainable artificial intelligence for human-aligned conversational explanations
Richard Dazeley
Peter Vamplew
Cameron Foale
Charlotte Young
Sunil Aryal
F. Cruz
30
89
0
07 Jul 2021
Core Challenges in Embodied Vision-Language Planning
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
44
45
0
26 Jun 2021
Assessing Multilingual Fairness in Pre-trained Multimodal Representations
Jialu Wang
Yang Liu
X. Wang
EGVM
26
35
0
12 Jun 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
23
138
0
17 May 2021
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
68
128
0
01 Mar 2021
Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov
Tobias Hinz
Federico Raue
Jörn Hees
Andreas Dengel
EGVM
22
175
0
25 Jan 2021
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
19
34
0
04 Dec 2020
Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach
Yahui Liu
Marco De Nadai
Deng Cai
Huayang Li
Xavier Alameda-Pineda
N. Sebe
Bruno Lepri
38
59
0
10 Aug 2020
A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task
M. Kalimuthu
Fabrizio Nunnari
Daniel Sonntag
MedIm
14
7
0
11 Jul 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
23
577
0
10 May 2020
Multimodal Machine Translation through Visuals and Speech
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
46
72
0
28 Nov 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
160
230
0
05 Dec 2018
Neural Modular Control for Embodied Question Answering
Abhishek Das
Georgia Gkioxari
Stefan Lee
Devi Parikh
Dhruv Batra
LM&Ro
132
127
0
26 Oct 2018
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&Ro
LRM
257
496
0
07 Jun 2018
Imagine This! Scripts to Compositions to Videos
Tanmay Gupta
Dustin Schwenk
Ali Farhadi
Derek Hoiem
Aniruddha Kembhavi
CoGe
VGen
111
87
0
10 Apr 2018
Image Generation from Scene Graphs
Justin Johnson
Agrim Gupta
Li Fei-Fei
GNN
223
815
0
04 Apr 2018
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
194
434
0
27 Mar 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,465
0
06 Jun 2016
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
214
1,326
0
05 Jun 2016
1