ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.2539
  4. Cited By
Unifying Visual-Semantic Embeddings with Multimodal Neural Language
  Models

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

10 November 2014
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
    VLM
ArXivPDFHTML

Papers citing "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"

50 / 263 papers shown
Title
Neural Storyboard Artist: Visualizing Stories with Coherent Image
  Sequences
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
25
33
0
24 Nov 2019
HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs
Fangyu Liu
Rongtian Ye
Xun Wang
Shuaipeng Li
23
32
0
22 Nov 2019
Root Mean Square Layer Normalization
Root Mean Square Layer Normalization
Biao Zhang
Rico Sennrich
25
672
0
16 Oct 2019
Target-Oriented Deformation of Visual-Semantic Embedding Space
Target-Oriented Deformation of Visual-Semantic Embedding Space
Takashi Matsubara
26
7
0
15 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text
  Retrieval
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
36
208
0
11 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual
  Multimodal Representations
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
30
25
0
30 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image
  Captioning with Neural Scene Graph Generators
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
30
37
0
22 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
25
299
0
12 Sep 2019
MULE: Multimodal Universal Language Embedding
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
32
40
0
08 Sep 2019
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
Ming Jiang
Qiuyuan Huang
Lei Zhang
Xin Eric Wang
Pengchuan Zhang
Zhe Gan
Jana Diesner
Jianfeng Gao
30
66
0
04 Sep 2019
Sequential Latent Spaces for Modeling the Intention During Diverse Image
  Captioning
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
J. Aneja
Harsh Agrawal
Dhruv Batra
Alex Schwing
BDL
VLM
26
66
0
22 Aug 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
36
387
0
31 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
133
0
22 Jul 2019
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music
  Retrieval
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval
S. Balke
Matthias Dorfer
Luis Carvalho
A. Arzt
Gerhard Widmer
19
11
0
26 Jun 2019
Joint Visual-Textual Embedding for Multimodal Style Search
Joint Visual-Textual Embedding for Multimodal Style Search
Gil Sadeh
L. Fritz
Gabi Shalev
Eduard Oks
37
8
0
15 Jun 2019
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language
  Feedback
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback
Hui Wu
Yupeng Gao
Xiaoxiao Guo
Ziad Al-Halah
Steven J. Rennie
Kristen Grauman
Rogerio Feris
EgoV
28
63
0
30 May 2019
3G structure for image caption generation
3G structure for image caption generation
Aihong Yuan
Xuelong Li
Xiaoqiang Lu
21
34
0
21 Apr 2019
Multi-modal gated recurrent units for image description
Multi-modal gated recurrent units for image description
Xuelong Li
Aihong Yuan
Xiaoqiang Lu
GAN
21
26
0
20 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image,
  Multi-sentence Documents
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
31
30
0
16 Apr 2019
PUNCH: Positive UNlabelled Classification based information retrieval in
  Hyperspectral images
PUNCH: Positive UNlabelled Classification based information retrieval in Hyperspectral images
Anirban Santara
Jayeeta Datta
S. Sarkar
Ankur Garg
K. Padia
Pabitra Mitra
27
1
0
09 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
30
193
0
05 Apr 2019
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image
  Synthesis
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
Minfeng Zhu
Pingbo Pan
Wei Chen
Yi Yang
GAN
17
574
0
02 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption
  Alignment
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
22
103
0
27 Mar 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Hexiang Hu
Ishan Misra
Laurens van der Maaten
24
22
0
19 Jan 2019
nocaps: novel object captioning at scale
nocaps: novel object captioning at scale
Harsh Agrawal
Karan Desai
Yufei Wang
Xinlei Chen
Rishabh Jain
Mark Johnson
Dhruv Batra
Devi Parikh
Stefan Lee
Peter Anderson
VLM
21
470
0
20 Dec 2018
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
Ryo Takahashi
Takashi Matsubara
K. Uehara
28
326
0
22 Nov 2018
A sequential guiding network with attention for image captioning
A sequential guiding network with attention for image captioning
Daouda Sow
Zengchang Qin
Mouhamed Niasse
T. Wan
26
3
0
01 Nov 2018
Engaging Image Captioning Via Personality
Engaging Image Captioning Via Personality
Kurt Shuster
Samuel Humeau
Hexiang Hu
Antoine Bordes
Jason Weston
40
149
0
25 Oct 2018
Pre-gen metrics: Predicting caption quality metrics without generating
  captions
Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti
Albert Gatt
K. Camilleri
26
2
0
12 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning
A Comprehensive Survey of Deep Learning for Image Captioning
Md Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
45
761
0
06 Oct 2018
CanvasGAN: A simple baseline for text to image generation by
  incrementally patching a canvas
CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas
Amanpreet Singh
Sharan Agrawal
DiffM
31
5
0
05 Oct 2018
A Visual Attention Grounding Neural Model for Multimodal Machine
  Translation
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
Mingyang Zhou
Runxiang Cheng
Yong Jae Lee
Zhou Yu
30
79
0
24 Aug 2018
Equal But Not The Same: Understanding the Implicit Relationship Between
  Persuasive Images and Text
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
Ruotong Wang
R. Hwa
Adriana Kovashka
24
54
0
21 Jul 2018
Face-Cap: Image Captioning using Facial Expression Analysis
Face-Cap: Image Captioning using Facial Expression Analysis
Omid Mohamad Nezami
Mark Dras
Peter Anderson
Len Hamey
CVBM
27
27
0
06 Jul 2018
Don't only Feel Read: Using Scene text to understand advertisements
Don't only Feel Read: Using Scene text to understand advertisements
Arka Ujjal Dey
Suman K. Ghosh
Ernest Valveny
DiffM
18
4
0
21 Jun 2018
Multimodal Grounding for Language Processing
Multimodal Grounding for Language Processing
Lisa Beinborn
Teresa Botschen
Iryna Gurevych
22
32
0
17 Jun 2018
Cross-modal Hallucination for Few-shot Fine-grained Recognition
Cross-modal Hallucination for Few-shot Fine-grained Recognition
Frederik Pahde
P. Jähnichen
T. Klein
Moin Nabi
44
21
0
13 Jun 2018
Like a Baby: Visually Situated Neural Language Acquisition
Like a Baby: Visually Situated Neural Language Acquisition
Alexander Ororbia
A. Mali
Mary Alexandria Kelly
David Reitter
31
4
0
29 May 2018
SemStyle: Learning to Generate Stylised Image Captions using Unaligned
  Text
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text
A. Mathews
Lexing Xie
Xuming He
VLM
27
115
0
18 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
41
141
0
02 May 2018
Cross-Modal Retrieval in the Cooking Context: Learning Semantic
  Text-Image Embeddings
Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings
Micael Carvalho
Rémi Cadène
David Picard
Laure Soulier
Nicolas Thome
Matthieu Cord
13
180
0
30 Apr 2018
Human Motion Modeling using DVGANs
Human Motion Modeling using DVGANs
Xiaoyu Lin
Mohamed R. Amer
24
75
0
27 Apr 2018
Beyond Narrative Description: Generating Poetry from Images by
  Multi-Adversarial Training
Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
Bei Liu
Jianlong Fu
Makoto P. Kato
Masatoshi Yoshikawa
GAN
30
73
0
23 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
24
220
0
01 Apr 2018
Interpretable and Globally Optimal Prediction for Textual Grounding
  using Image Concepts
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Raymond A. Yeh
Jinjun Xiong
Wen-mei W. Hwu
Minh Do
Alex Schwing
30
57
0
29 Mar 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
Alex Schwing
22
40
0
29 Mar 2018
Text2Shape: Generating Shapes from Natural Language by Learning Joint
  Embeddings
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Kevin Chen
Chris Choy
Manolis Savva
Angel X. Chang
Thomas Funkhouser
Silvio Savarese
3DV
32
247
0
22 Mar 2018
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
30
1,142
0
21 Mar 2018
IDEL: In-Database Entity Linking with Neural Embeddings
IDEL: In-Database Entity Linking with Neural Embeddings
T. Kilias
Alexander Loser
Felix Alexander Gers
Richard Koopmanschap
Wenjie Qu
M. Kersten
21
12
0
13 Mar 2018
Photographic Text-to-Image Synthesis with a Hierarchically-nested
  Adversarial Network
Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network
Zizhao Zhang
Yuanpu Xie
Ling Yang
EGVM
32
304
0
26 Feb 2018
Previous
123456
Next