ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.2539
  4. Cited By
Unifying Visual-Semantic Embeddings with Multimodal Neural Language
  Models

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

10 November 2014
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
    VLM
ArXivPDFHTML

Papers citing "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"

50 / 263 papers shown
Title
Interactive Image Manipulation with Natural Language Instruction
  Commands
Interactive Image Manipulation with Natural Language Instruction Commands
Seitaro Shinagawa
Koichiro Yoshino
S. Sakti
Yu Suzuki
Satoshi Nakamura
38
14
0
23 Feb 2018
Zero-Shot Question Generation from Knowledge Graphs for Unseen
  Predicates and Entity Types
Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types
Hady ElSahar
Christophe Gravier
F. Laforest
BDL
22
80
0
19 Feb 2018
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
Pelin Dogan
Boyang Albert Li
Leonid Sigal
Markus Gross
AI4TS
30
19
0
19 Feb 2018
Describing Semantic Representations of Brain Activity Evoked by Visual
  Stimuli
Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli
Eri Matsuo
Ichiro Kobayashi
Shinji Nishimoto
S. Nishida
H. Asoh
21
14
0
19 Jan 2018
DeepStyle: Multimodal Search Engine for Fashion and Interior Design
DeepStyle: Multimodal Search Engine for Fashion and Interior Design
Ivona Tautkute
Tomasz Trzciñski
Aleksander P. Skorupa
Łukasz Brocki
K. Marasek
27
55
0
08 Jan 2018
Cross-modal Embeddings for Video and Audio Retrieval
Cross-modal Embeddings for Video and Audio Retrieval
Dídac Surís
A. Duarte
Amaia Salvador
Jordi Torres
Xavier Giró-i-Nieto
SSL
21
69
0
07 Jan 2018
Video Object Detection with an Aligned Spatial-Temporal Memory
Video Object Detection with an Aligned Spatial-Temporal Memory
Fanyi Xiao
Yong Jae Lee
49
189
0
18 Dec 2017
HP-GAN: Probabilistic 3D human motion prediction via GAN
HP-GAN: Probabilistic 3D human motion prediction via GAN
Emad Barsoum
J. Kender
Zicheng Liu
3DH
56
330
0
27 Nov 2017
A Neural-Symbolic Approach to Design of CAPTCHA
A Neural-Symbolic Approach to Design of CAPTCHA
Qiuyuan Huang
P. Smolensky
Xiaodong He
Li Deng
D. Wu
AAML
36
1
0
29 Oct 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training
  dataset for image captioning
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
Yang Xian
Yingli Tian
VLM
30
22
0
15 Sep 2017
Link the head to the "beak": Zero Shot Learning from Noisy Text
  Description at Part Precision
Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision
Mohamed Elhoseiny
Yizhe Zhu
Han Zhang
Ahmed Elgammal
VLM
38
132
0
04 Sep 2017
Reasoning about Fine-grained Attribute Phrases using Reference Games
Reasoning about Fine-grained Attribute Phrases using Reference Games
Jong-Chyi Su
Chenyun Wu
Huaizu Jiang
Subhransu Maji
34
16
0
29 Aug 2017
Open-World Visual Recognition Using Knowledge Graphs
Open-World Visual Recognition Using Knowledge Graphs
V. Lonij
Ambrish Rawat
Maria-Irina Nicolae
37
15
0
28 Aug 2017
Fluency-Guided Cross-Lingual Image Captioning
Fluency-Guided Cross-Lingual Image Captioning
Weiyu Lan
Xirong Li
Jianfeng Dong
19
93
0
15 Aug 2017
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption
  Generator?
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?
Marc Tanti
Albert Gatt
K. Camilleri
24
56
0
07 Aug 2017
Automatic Spatially-aware Fashion Concept Discovery
Automatic Spatially-aware Fashion Concept Discovery
Xintong Han
Zuxuan Wu
Phoenix X. Huang
Xiao Zhang
Menglong Zhu
Yuan Li
Yang Zhao
L. Davis
47
267
0
03 Aug 2017
Learning Audio - Sheet Music Correspondences for Score Identification
  and Offline Alignment
Learning Audio - Sheet Music Correspondences for Score Identification and Offline Alignment
Matthias Dorfer
A. Arzt
Gerhard Widmer
41
43
0
31 Jul 2017
Deep Interactive Region Segmentation and Captioning
Deep Interactive Region Segmentation and Captioning
Ali Sharifi Boroujerdi
M. Khanian
M. Breuß
24
7
0
26 Jul 2017
Semantic Image Synthesis via Adversarial Learning
Semantic Image Synthesis via Adversarial Learning
Hao Dong
Simiao Yu
Chao Wu
Yike Guo
GAN
20
265
0
21 Jul 2017
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
Fartash Faghri
David J. Fleet
J. Kiros
Sanja Fidler
VLM
11
181
0
18 Jul 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
26
174
0
04 Jul 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
15
2,868
0
26 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
35
139
0
03 May 2017
Query-adaptive Video Summarization via Quality-aware Relevance
  Estimation
Query-adaptive Video Summarization via Quality-aware Relevance Estimation
A. Vasudevan
Michael Gygli
Anna Volokitin
Luc Van Gool
40
93
0
01 May 2017
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Zhou Ren
Xiaoyu Wang
Ning Zhang
Xutao Lv
Li Li
34
324
0
12 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
AMC: Attention guided Multi-modal Correlation Learning for Image Search
AMC: Attention guided Multi-modal Correlation Learning for Image Search
Kan Chen
Trung Bui
Chen Fang
Zhaowen Wang
Ram Nevatia
37
38
0
03 Apr 2017
Where to put the Image in an Image Caption Generator
Where to put the Image in an Image Caption Generator
Marc Tanti
Albert Gatt
K. Camilleri
47
96
0
27 Mar 2017
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation
Hao Dong
Jingqing Zhang
Douglas McIlwraith
Yike Guo
35
58
0
20 Mar 2017
Gated Multimodal Units for Information Fusion
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
33
373
0
07 Feb 2017
Multilingual Multi-modal Embeddings for Natural Language Processing
Multilingual Multi-modal Embeddings for Natural Language Processing
Iacer Calixto
Qun Liu
N. Campbell
24
19
0
03 Feb 2017
Incorporating Global Visual Features into Attention-Based Neural Machine
  Translation
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
Iacer Calixto
Qun Liu
Nick Campbell
32
154
0
23 Jan 2017
Learning Visual N-Grams from Web Data
Learning Visual N-Grams from Web Data
Ang Li
Allan Jabri
Armand Joulin
Laurens van der Maaten
VLM
20
136
0
29 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
164
3,136
0
02 Dec 2016
Video Captioning with Multi-Faceted Attention
Video Captioning with Multi-Faceted Attention
Xiang Long
Chuang Gan
Gerard de Melo
30
88
0
01 Dec 2016
Dense Captioning with Joint Inference and Visual Context
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li Li
VLM
30
169
0
21 Nov 2016
Recurrent Memory Addressing for describing videos
Recurrent Memory Addressing for describing videos
A. Jain
Abhinav Agarwalla
Kumar Krishna Agrawal
Pabitra Mitra
38
10
0
20 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal
  LSTM
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
Yan Huang
Wei Wang
Liang Wang
26
222
0
17 Nov 2016
Zero-resource Machine Translation by Multimodal Encoder-decoder Network
  with Multimedia Pivot
Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot
Hideki Nakayama
Noriki Nishida
32
62
0
14 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
45
664
0
02 Nov 2016
Learning What and Where to Draw
Learning What and Where to Draw
Scott E. Reed
Zeynep Akata
S. Mohan
Samuel Tenka
Bernt Schiele
Honglak Lee
DRL
GAN
30
618
0
08 Oct 2016
A Survey of Multi-View Representation Learning
A Survey of Multi-View Representation Learning
Yingming Li
Ming Yang
Zhongfei Zhang
AI4TS
3DV
37
509
0
03 Oct 2016
Learning Language-Visual Embedding for Movie Understanding with
  Natural-Language
Learning Language-Visual Embedding for Movie Understanding with Natural-Language
Atousa Torabi
Niket Tandon
Leonid Sigal
22
97
0
26 Sep 2016
Image-embodied Knowledge Representation Learning
Image-embodied Knowledge Representation Learning
Ruobing Xie
Zhiyuan Liu
Huanbo Luan
Maosong Sun
122
211
0
22 Sep 2016
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning
  Challenge
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
30
851
0
21 Sep 2016
Measuring Machine Intelligence Through Visual Question Answering
Measuring Machine Intelligence Through Visual Question Answering
C. L. Zitnick
Aishwarya Agrawal
Stanislaw Antol
Margaret Mitchell
Dhruv Batra
Devi Parikh
27
37
0
31 Aug 2016
Linking Image and Text with 2-Way Nets
Linking Image and Text with 2-Way Nets
Aviv Eisenschtat
Lior Wolf
27
176
0
29 Aug 2016
Learning to generalize to new compositions in image understanding
Learning to generalize to new compositions in image understanding
Yuval Atzmon
Jonathan Berant
Vahid Kezami
Amir Globerson
Gal Chechik
26
67
0
27 Aug 2016
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
Y. Tan
Chee Seng Chan
VLM
22
29
0
20 Aug 2016
Modeling Context in Referring Expressions
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
53
1,233
0
31 Jul 2016
Previous
123456
Next