Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1411.2539
Cited By
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
10 November 2014
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"
50 / 263 papers shown
Title
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
25
33
0
24 Nov 2019
HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Fangyu Liu
Rongtian Ye
Xun Wang
Shuaipeng Li
23
32
0
22 Nov 2019
Root Mean Square Layer Normalization
Biao Zhang
Rico Sennrich
25
672
0
16 Oct 2019
Target-Oriented Deformation of Visual-Semantic Embedding Space
Takashi Matsubara
26
7
0
15 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
36
208
0
11 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
30
25
0
30 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
30
37
0
22 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
25
299
0
12 Sep 2019
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
32
40
0
08 Sep 2019
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
Ming Jiang
Qiuyuan Huang
Lei Zhang
Xin Eric Wang
Pengchuan Zhang
Zhe Gan
Jana Diesner
Jianfeng Gao
30
66
0
04 Sep 2019
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
J. Aneja
Harsh Agrawal
Dhruv Batra
Alex Schwing
BDL
VLM
26
66
0
22 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
36
387
0
31 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
133
0
22 Jul 2019
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval
S. Balke
Matthias Dorfer
Luis Carvalho
A. Arzt
Gerhard Widmer
19
11
0
26 Jun 2019
Joint Visual-Textual Embedding for Multimodal Style Search
Gil Sadeh
L. Fritz
Gabi Shalev
Eduard Oks
37
8
0
15 Jun 2019
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback
Hui Wu
Yupeng Gao
Xiaoxiao Guo
Ziad Al-Halah
Steven J. Rennie
Kristen Grauman
Rogerio Feris
EgoV
28
63
0
30 May 2019
3G structure for image caption generation
Aihong Yuan
Xuelong Li
Xiaoqiang Lu
21
34
0
21 Apr 2019
Multi-modal gated recurrent units for image description
Xuelong Li
Aihong Yuan
Xiaoqiang Lu
GAN
21
26
0
20 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
31
30
0
16 Apr 2019
PUNCH: Positive UNlabelled Classification based information retrieval in Hyperspectral images
Anirban Santara
Jayeeta Datta
S. Sarkar
Ankur Garg
K. Padia
Pabitra Mitra
27
1
0
09 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
30
193
0
05 Apr 2019
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
Minfeng Zhu
Pingbo Pan
Wei Chen
Yi Yang
GAN
17
574
0
02 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
22
103
0
27 Mar 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Hexiang Hu
Ishan Misra
Laurens van der Maaten
24
22
0
19 Jan 2019
nocaps: novel object captioning at scale
Harsh Agrawal
Karan Desai
Yufei Wang
Xinlei Chen
Rishabh Jain
Mark Johnson
Dhruv Batra
Devi Parikh
Stefan Lee
Peter Anderson
VLM
21
470
0
20 Dec 2018
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
Ryo Takahashi
Takashi Matsubara
K. Uehara
28
326
0
22 Nov 2018
A sequential guiding network with attention for image captioning
Daouda Sow
Zengchang Qin
Mouhamed Niasse
T. Wan
26
3
0
01 Nov 2018
Engaging Image Captioning Via Personality
Kurt Shuster
Samuel Humeau
Hexiang Hu
Antoine Bordes
Jason Weston
40
149
0
25 Oct 2018
Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti
Albert Gatt
K. Camilleri
26
2
0
12 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning
Md Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
45
761
0
06 Oct 2018
CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas
Amanpreet Singh
Sharan Agrawal
DiffM
31
5
0
05 Oct 2018
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
Mingyang Zhou
Runxiang Cheng
Yong Jae Lee
Zhou Yu
30
79
0
24 Aug 2018
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
Ruotong Wang
R. Hwa
Adriana Kovashka
24
54
0
21 Jul 2018
Face-Cap: Image Captioning using Facial Expression Analysis
Omid Mohamad Nezami
Mark Dras
Peter Anderson
Len Hamey
CVBM
27
27
0
06 Jul 2018
Don't only Feel Read: Using Scene text to understand advertisements
Arka Ujjal Dey
Suman K. Ghosh
Ernest Valveny
DiffM
18
4
0
21 Jun 2018
Multimodal Grounding for Language Processing
Lisa Beinborn
Teresa Botschen
Iryna Gurevych
22
32
0
17 Jun 2018
Cross-modal Hallucination for Few-shot Fine-grained Recognition
Frederik Pahde
P. Jähnichen
T. Klein
Moin Nabi
44
21
0
13 Jun 2018
Like a Baby: Visually Situated Neural Language Acquisition
Alexander Ororbia
A. Mali
Mary Alexandria Kelly
David Reitter
31
4
0
29 May 2018
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text
A. Mathews
Lexing Xie
Xuming He
VLM
27
115
0
18 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
41
141
0
02 May 2018
Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings
Micael Carvalho
Rémi Cadène
David Picard
Laure Soulier
Nicolas Thome
Matthieu Cord
13
180
0
30 Apr 2018
Human Motion Modeling using DVGANs
Xiaoyu Lin
Mohamed R. Amer
24
75
0
27 Apr 2018
Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
Bei Liu
Jianlong Fu
Makoto P. Kato
Masatoshi Yoshikawa
GAN
30
73
0
23 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
24
220
0
01 Apr 2018
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Raymond A. Yeh
Jinjun Xiong
Wen-mei W. Hwu
Minh Do
Alex Schwing
30
57
0
29 Mar 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
Alex Schwing
22
40
0
29 Mar 2018
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Kevin Chen
Chris Choy
Manolis Savva
Angel X. Chang
Thomas Funkhouser
Silvio Savarese
3DV
32
247
0
22 Mar 2018
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
30
1,142
0
21 Mar 2018
IDEL: In-Database Entity Linking with Neural Embeddings
T. Kilias
Alexander Loser
Felix Alexander Gers
Richard Koopmanschap
Wenjie Qu
M. Kersten
21
12
0
13 Mar 2018
Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network
Zizhao Zhang
Yuanpu Xie
Ling Yang
EGVM
32
304
0
26 Feb 2018
Previous
1
2
3
4
5
6
Next