Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

10 November 2014

Papers citing "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"

50 / 263 papers shown

Title
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences Shizhe Chen Bei Liu Jianlong Fu Ruihua Song Qin Jin Pingping Lin Xiaoyu Qi Chunting Wang Jin Zhou DiffM 25 33 0 24 Nov 2019
HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs Fangyu Liu Rongtian Ye Xun Wang Shuaipeng Li 23 32 0 22 Nov 2019
Root Mean Square Layer Normalization Biao Zhang Rico Sennrich 25 672 0 16 Oct 2019
Target-Oriented Deformation of Visual-Semantic Embedding Space Takashi Matsubara 26 7 0 15 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval Sijin Wang Ruiping Wang Ziwei Yao Shiguang Shan Xilin Chen 3DV 36 208 0 11 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations Po-Yao (Bernie) Huang Xiaojun Chang Alexander G. Hauptmann 30 25 0 30 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators Kuang-Huei Lee Hamid Palangi Xi Chen Houdong Hu Jianfeng Gao VLM 30 37 0 22 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval Zihao Wang Xihui Liu Hongsheng Li Lu Sheng Junjie Yan Xiaogang Wang Jing Shao VLM 25 299 0 12 Sep 2019
MULE: Multimodal Universal Language Embedding Donghyun Kim Kuniaki Saito Kate Saenko Stan Sclaroff Bryan A. Plummer VLM 32 40 0 08 Sep 2019
TIGEr: Text-to-Image Grounding for Image Caption Evaluation Ming Jiang Qiuyuan Huang Lei Zhang Xin Eric Wang Pengchuan Zhang Zhe Gan Jana Diesner Jianfeng Gao 30 66 0 04 Sep 2019
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning J. Aneja Harsh Agrawal Dhruv Batra Alex Schwing BDL VLM 26 66 0 22 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts Yang Liu Samuel Albanie Arsha Nagrani Andrew Zisserman 36 387 0 31 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods Aditya Mogadala M. Kalimuthu Dietrich Klakow VLM 25 133 0 22 Jul 2019
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval S. Balke Matthias Dorfer Luis Carvalho A. Arzt Gerhard Widmer 19 11 0 26 Jun 2019
Joint Visual-Textual Embedding for Multimodal Style Search Gil Sadeh L. Fritz Gabi Shalev Eduard Oks 37 8 0 15 Jun 2019
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback Hui Wu Yupeng Gao Xiaoxiao Guo Ziad Al-Halah Steven J. Rennie Kristen Grauman Rogerio Feris EgoV 28 63 0 30 May 2019
3G structure for image caption generation Aihong Yuan Xuelong Li Xiaoqiang Lu 21 34 0 21 Apr 2019
Multi-modal gated recurrent units for image description Xuelong Li Aihong Yuan Xiaoqiang Lu GAN 21 26 0 20 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents Jack Hessel Lillian Lee David M. Mimno 31 30 0 16 Apr 2019
PUNCH: Positive UNlabelled Classification based information retrieval in Hyperspectral images Anirban Santara Jayeeta Datta S. Sarkar Ankur Garg K. Padia Pabitra Mitra 27 1 0 09 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries Niluthpol Chowdhury Mithun S. Paul Amit K. Roy-Chowdhury 30 193 0 05 Apr 2019
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis Minfeng Zhu Pingbo Pan Wei Chen Yi Yang GAN 17 574 0 02 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment Samyak Datta Karan Sikka Anirban Roy Karuna Ahuja Devi Parikh Ajay Divakaran 22 103 0 27 Mar 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON) Hexiang Hu Ishan Misra Laurens van der Maaten 24 22 0 19 Jan 2019
nocaps: novel object captioning at scale Harsh Agrawal Karan Desai Yufei Wang Xinlei Chen Rishabh Jain Mark Johnson Dhruv Batra Devi Parikh Stefan Lee Peter Anderson VLM 21 470 0 20 Dec 2018
Data Augmentation using Random Image Cropping and Patching for Deep CNNs Ryo Takahashi Takashi Matsubara K. Uehara 28 326 0 22 Nov 2018
A sequential guiding network with attention for image captioning Daouda Sow Zengchang Qin Mouhamed Niasse T. Wan 26 3 0 01 Nov 2018
Engaging Image Captioning Via Personality Kurt Shuster Samuel Humeau Hexiang Hu Antoine Bordes Jason Weston 40 149 0 25 Oct 2018
Pre-gen metrics: Predicting caption quality metrics without generating captions Marc Tanti Albert Gatt K. Camilleri 26 2 0 12 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning Md Zakir Hossain Ferdous Sohel M. Shiratuddin Hamid Laga VLM 3DV 45 761 0 06 Oct 2018
CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas Amanpreet Singh Sharan Agrawal DiffM 31 5 0 05 Oct 2018
A Visual Attention Grounding Neural Model for Multimodal Machine Translation Mingyang Zhou Runxiang Cheng Yong Jae Lee Zhou Yu 30 79 0 24 Aug 2018
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text Ruotong Wang R. Hwa Adriana Kovashka 24 54 0 21 Jul 2018
Face-Cap: Image Captioning using Facial Expression Analysis Omid Mohamad Nezami Mark Dras Peter Anderson Len Hamey CVBM 27 27 0 06 Jul 2018
Don't only Feel Read: Using Scene text to understand advertisements Arka Ujjal Dey Suman K. Ghosh Ernest Valveny DiffM 18 4 0 21 Jun 2018
Multimodal Grounding for Language Processing Lisa Beinborn Teresa Botschen Iryna Gurevych 22 32 0 17 Jun 2018
Cross-modal Hallucination for Few-shot Fine-grained Recognition Frederik Pahde P. Jähnichen T. Klein Moin Nabi 44 21 0 13 Jun 2018
Like a Baby: Visually Situated Neural Language Acquisition Alexander Ororbia A. Mali Mary Alexandria Kelly David Reitter 31 4 0 29 May 2018
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text A. Mathews Lexing Xie Xuming He VLM 27 115 0 18 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani Samuel Albanie Andrew Zisserman SSL 41 141 0 02 May 2018
Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings Micael Carvalho Rémi Cadène David Picard Laure Soulier Nicolas Thome Matthieu Cord 13 180 0 30 Apr 2018
Human Motion Modeling using DVGANs Xiaoyu Lin Mohamed R. Amer 24 75 0 27 Apr 2018
Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training Bei Liu Jianlong Fu Makoto P. Kato Masatoshi Yoshikawa GAN 30 73 0 23 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching Arsha Nagrani Samuel Albanie Andrew Zisserman CVBM 24 220 0 01 Apr 2018
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts Raymond A. Yeh Jinjun Xiong Wen-mei W. Hwu Minh Do Alex Schwing 30 57 0 29 Mar 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts Raymond A. Yeh Minh Do Alex Schwing 22 40 0 29 Mar 2018
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings Kevin Chen Chris Choy Manolis Savva Angel X. Chang Thomas Funkhouser Silvio Savarese 3DV 32 247 0 22 Mar 2018
Stacked Cross Attention for Image-Text Matching Kuang-Huei Lee Xi Chen G. Hua Houdong Hu Xiaodong He 30 1,142 0 21 Mar 2018
IDEL: In-Database Entity Linking with Neural Embeddings T. Kilias Alexander Loser Felix Alexander Gers Richard Koopmanschap Wenjie Qu M. Kersten 21 12 0 13 Mar 2018
Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network Zizhao Zhang Yuanpu Xie Ling Yang EGVM 32 304 0 26 Feb 2018