Towards General Purpose Vision Systems

v1v2 (latest)

Towards General Purpose Vision Systems

1 April 2021

Aniruddha Kembhavi

ArXiv (abs)PDF HTML

Papers citing "Towards General Purpose Vision Systems"

16 / 66 papers shown

Title
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data Lisa Anne Hendricks Subhashini Venugopalan Marcus Rohrbach Raymond J. Mooney Kate Saenko Trevor Darrell CoGe 69 284 0 17 Nov 2015
Grounding of Textual Phrases in Images by Reconstruction Anna Rohrbach Marcus Rohrbach Ronghang Hu Trevor Darrell Bernt Schiele 82 497 0 12 Nov 2015
Generation and Comprehension of Unambiguous Object Descriptions Junhua Mao Jonathan Huang Alexander Toshev Oana-Maria Camburu Alan Yuille Kevin Patrick Murphy ObjD 138 1,359 0 07 Nov 2015
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing A. Kumar Ozan Irsoy Peter Ondruska Mohit Iyyer James Bradbury Ishaan Gulrajani Victor Zhong Romain Paulus R. Socher 129 1,182 0 24 Jun 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books Yukun Zhu Ryan Kiros R. Zemel Ruslan Salakhutdinov R. Urtasun Antonio Torralba Sanja Fidler 142 2,555 0 22 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross B. Girshick Jian Sun AIMat ObjD 537 62,477 0 04 Jun 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models Bryan A. Plummer Liwei Wang Christopher M. Cervantes Juan C. Caicedo Julia Hockenmaier Svetlana Lazebnik 216 2,074 0 19 May 2015
VQA: Visual Question Answering Aishwarya Agrawal Jiasen Lu Stanislaw Antol Margaret Mitchell C. L. Zitnick Dhruv Batra Devi Parikh CoGe 238 5,512 0 03 May 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 352 10,091 0 10 Feb 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions A. Karpathy Li Fei-Fei 154 5,595 0 07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 306 4,520 0 20 Nov 2014
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen Rob Fergus VLM MDE 214 2,684 0 18 Nov 2014
Show and Tell: A Neural Image Caption Generator Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 3DV 265 6,042 0 17 Nov 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models Ryan Kiros Ruslan Salakhutdinov R. Zemel VLM 133 1,401 0 10 Nov 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 437 43,875 0 01 May 2014
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics Yunchao Gong Qifa Ke Michael Isard Svetlana Lazebnik 3DV 154 585 0 18 Dec 2012