ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.09358
  4. Cited By
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

22 July 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
    VLM
ArXivPDFHTML

Papers citing "Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"

34 / 34 papers shown
Title
VISTA: A Visual and Textual Attention Dataset for Interpreting
  Multimodal Models
VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models
Harshit
Tolga Tasdizen
CoGe
VLM
28
1
0
06 Oct 2024
CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian
  Splatting
CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting
Runze Chen
Mingyu Xiao
Haiyong Luo
Fang Zhao
Fan Wu
Hao Xiong
Qi Liu
Meng Song
3DGS
39
0
0
13 Sep 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
36
19
0
30 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
89
6
0
03 Mar 2024
Rethinking the Evaluating Framework for Natural Language Understanding
  in AI Systems: Language Acquisition as a Core for Future Metrics
Rethinking the Evaluating Framework for Natural Language Understanding in AI Systems: Language Acquisition as a Core for Future Metrics
Patricio Vera
Pedro Moya
Lisa Barraza
ELM
18
1
0
21 Sep 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
34
90
0
14 May 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
26
1
0
05 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
29
4
0
04 Mar 2023
Learning More May Not Be Better: Knowledge Transferability in Vision and
  Language Tasks
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
38
0
0
23 Aug 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context
  Augmented Dialogue System: A Review
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
30
2
0
02 Jul 2022
2D Human Pose Estimation: A Survey
2D Human Pose Estimation: A Survey
Haoming Chen
Runyang Feng
Sifan Wu
Hao Xu
F. Zhou
Zhenguang Liu
3DH
25
55
0
15 Apr 2022
Vision-and-Language Pretrained Models: A Survey
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
24
63
0
15 Apr 2022
A Survey of Vision-Language Pre-Trained Models
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
28
179
0
18 Feb 2022
A Survey of Natural Language Generation
A Survey of Natural Language Generation
Chenhe Dong
Yinghui Li
Haifan Gong
M. Chen
Junxin Li
Ying Shen
Min Yang
3DV
24
43
0
22 Dec 2021
Levels of explainable artificial intelligence for human-aligned
  conversational explanations
Levels of explainable artificial intelligence for human-aligned conversational explanations
Richard Dazeley
Peter Vamplew
Cameron Foale
Charlotte Young
Sunil Aryal
F. Cruz
30
89
0
07 Jul 2021
Core Challenges in Embodied Vision-Language Planning
Core Challenges in Embodied Vision-Language Planning
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
44
45
0
26 Jun 2021
Assessing Multilingual Fairness in Pre-trained Multimodal
  Representations
Assessing Multilingual Fairness in Pre-trained Multimodal Representations
Jialu Wang
Yang Liu
X. Wang
EGVM
26
35
0
12 Jun 2021
A Review on Explainability in Multimodal Deep Neural Nets
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
23
138
0
17 May 2021
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding
  on Point Clouds through Instance Multi-level Contextual Referring
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
68
128
0
01 Mar 2021
Adversarial Text-to-Image Synthesis: A Review
Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov
Tobias Hinz
Federico Raue
Jörn Hees
Andreas Dengel
EGVM
22
175
0
25 Jan 2021
WeaQA: Weak Supervision via Captions for Visual Question Answering
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
19
34
0
04 Dec 2020
Describe What to Change: A Text-guided Unsupervised Image-to-Image
  Translation Approach
Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach
Yahui Liu
Marco De Nadai
Deng Cai
Huayang Li
Xavier Alameda-Pineda
N. Sebe
Bruno Lepri
38
59
0
10 Aug 2020
A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption
  2020 Task
A Competitive Deep Neural Network Approach for the ImageCLEFmed Caption 2020 Task
M. Kalimuthu
Fabrizio Nunnari
Daniel Sonntag
MedIm
14
7
0
11 Jul 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
23
577
0
10 May 2020
Multimodal Machine Translation through Visuals and Speech
Multimodal Machine Translation through Visuals and Speech
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
46
72
0
28 Nov 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Explainable and Explicit Visual Reasoning over Scene Graphs
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
160
230
0
05 Dec 2018
Neural Modular Control for Embodied Question Answering
Neural Modular Control for Embodied Question Answering
Abhishek Das
Georgia Gkioxari
Stefan Lee
Devi Parikh
Dhruv Batra
LM&Ro
132
127
0
26 Oct 2018
Speaker-Follower Models for Vision-and-Language Navigation
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&Ro
LRM
257
496
0
07 Jun 2018
Imagine This! Scripts to Compositions to Videos
Imagine This! Scripts to Compositions to Videos
Tanmay Gupta
Dustin Schwenk
Ali Farhadi
Derek Hoiem
Aniruddha Kembhavi
CoGe
VGen
111
87
0
10 Apr 2018
Image Generation from Scene Graphs
Image Generation from Scene Graphs
Justin Johnson
Agrim Gupta
Li Fei-Fei
GNN
223
815
0
04 Apr 2018
Neural Baby Talk
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
194
434
0
27 Mar 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,465
0
06 Jun 2016
Deep Reinforcement Learning for Dialogue Generation
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
214
1,326
0
05 Jun 2016
1