UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

20 May 2022

Papers citing "UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes"

49 / 49 papers shown

Title
Quantization-Free Autoregressive Action Transformer Ziyad Sheebaelhamd Michael Tschannen Michael Muehlebach Claire Vernade 79 0 0 18 Mar 2025
Restructuring Vector Quantization with the Rotation Trick Christopher Fifty Ronald G. Junkins Dennis Duan Aniketh Iger Jerry W. Liu Ehsan Amid Sebastian Thrun Christopher Ré LLMSV 114 13 0 08 Oct 2024
BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation Zhenyu Li Xuyang Wang Xianming Liu Junjun Jiang MDE 64 194 0 03 Apr 2022
Transframer: Arbitrary Frame Prediction with Generative Models C. Nash João Carreira Jacob Walker Iain Barr Andrew Jaegle Mateusz Malinowski Peter W. Battaglia ViT 62 38 0 17 Mar 2022
Masked-attention Mask Transformer for Universal Image Segmentation Bowen Cheng Ishan Misra Alex Schwing Alexander Kirillov Rohit Girdhar ISeg 202 2,355 0 02 Dec 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Chenfei Wu Jian Liang Lei Ji Fan Yang Yuejian Fang Daxin Jiang Nan Duan ViT VGen 61 294 0 24 Nov 2021
Palette: Image-to-Image Diffusion Models Chitwan Saharia William Chan Huiwen Chang Chris A. Lee Jonathan Ho Tim Salimans David J. Fleet Mohammad Norouzi DiffM VLM 469 1,630 0 10 Nov 2021
Vector-quantized Image Modeling with Improved VQGAN Jiahui Yu Xin Li Jing Yu Koh Han Zhang Ruoming Pang James Qin Alexander Ku Yuanzhong Xu Jason Baldridge Yonghui Wu ViT VLM DRL 93 514 0 09 Oct 2021
Pix2seq: A Language Modeling Framework for Object Detection Ting-Li Chen Saurabh Saxena Lala Li David J. Fleet Geoffrey E. Hinton MLLM ViT VLM 262 347 0 22 Sep 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs Andrew Jaegle Sebastian Borgeaud Jean-Baptiste Alayrac Carl Doersch Catalin Ionescu ... Olivier J. Hénaff M. Botvinick Andrew Zisserman Oriol Vinyals João Carreira MLLM VLM GNN 52 579 0 30 Jul 2021
Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng Alex Schwing Alexander Kirillov VLM ViT 193 1,532 0 13 Jul 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers Andreas Steiner Alexander Kolesnikov Xiaohua Zhai Ross Wightman Jakob Uszkoreit Lucas Beyer ViT 107 632 0 18 Jun 2021
Scaling Vision Transformers Xiaohua Zhai Alexander Kolesnikov N. Houlsby Lucas Beyer ViT 128 1,084 0 08 Jun 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 385 4,919 0 24 Feb 2021
Colorization Transformer Manoj Kumar Dirk Weissenborn Nal Kalchbrenner ViT 273 159 0 08 Feb 2021
Taming Transformers for High-Resolution Image Synthesis Patrick Esser Robin Rombach Bjorn Ommer ViT 112 2,947 0 17 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 555 40,739 0 22 Oct 2020
Denoising Diffusion Probabilistic Models Jonathan Ho Ajay Jain Pieter Abbeel DiffM 522 18,008 0 19 Jun 2020
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 363 13,002 0 26 May 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 385 20,114 0 23 Oct 2019
High Quality Monocular Depth Estimation via Transfer Learning Ibraheem Alhashim Peter Wonka MDE 3DV 55 458 0 31 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.6K 94,729 0 11 Oct 2018
Detecting Visual Relationships Using Box Attention Alexander Kolesnikov Alina Kuznetsova Christoph H. Lampert V. Ferrari 79 65 0 05 Jul 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost Noam M. Shazeer Mitchell Stern ODL 72 1,045 0 11 Apr 2018
Panoptic Segmentation Alexander Kirillov Kaiming He Ross B. Girshick Carsten Rother Piotr Dollár 101 1,434 0 03 Jan 2018
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 210 4,989 0 02 Nov 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 656 131,414 0 12 Jun 2017
PixColor: Pixel Recursive Colorization S. Guadarrama Ryan Dahl David Bieber Mohammad Norouzi Jonathon Shlens Kevin Patrick Murphy 61 107 0 19 May 2017
Probabilistic Image Colorization Amelie Royer Alexander Kolesnikov Christoph H. Lampert 42 44 0 11 May 2017
Mask R-CNN Kaiming He Georgia Gkioxari Piotr Dollár Ross B. Girshick ObjD 344 27,174 0 20 Mar 2017
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications Tim Salimans A. Karpathy Xi Chen Diederik P. Kingma 95 941 0 19 Jan 2017
Image-to-Image Translation with Conditional Adversarial Networks Phillip Isola Jun-Yan Zhu Tinghui Zhou Alexei A. Efros SSeg 311 19,640 0 21 Nov 2016
Conditional Image Generation with PixelCNN Decoders Aaron van den Oord Nal Kalchbrenner Oriol Vinyals L. Espeholt Alex Graves Koray Kavukcuoglu VLM 187 2,506 0 16 Jun 2016
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Liang-Chieh Chen George Papandreou Iasonas Kokkinos Kevin Patrick Murphy Alan Yuille SSeg 233 18,195 0 02 Jun 2016
Chained Predictions Using Convolutional Neural Networks Georgia Gkioxari Alexander Toshev Navdeep Jaitly BDL 63 190 0 08 May 2016
Colorful Image Colorization Richard Y. Zhang Phillip Isola Alexei A. Efros 124 3,530 0 28 Mar 2016
Pixel Recurrent Neural Networks Aaron van den Oord Nal Kalchbrenner Koray Kavukcuoglu SSeg GAN 453 2,567 0 25 Jan 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.1K 193,814 0 10 Dec 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross B. Girshick Jian Sun AIMat ObjD 482 62,122 0 04 Jun 2015
Deep Unsupervised Learning using Nonequilibrium Thermodynamics Jascha Narain Sohl-Dickstein Eric A. Weiss Niru Maheswaranathan Surya Ganguli SyDa DiffM 265 6,925 0 12 Mar 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.6K 150,006 0 22 Dec 2014
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 422 43,635 0 17 Sep 2014
Sequence to Sequence Learning with Neural Networks Ilya Sutskever Oriol Vinyals Quoc V. Le AIMat 397 20,528 0 10 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.5K 100,330 0 04 Sep 2014
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches Kyunghyun Cho B. V. Merrienboer Dzmitry Bahdanau Yoshua Bengio AI4CE AIMat 231 6,772 0 03 Sep 2014
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 1.6K 39,472 0 01 Sep 2014
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network David Eigen Christian Puhrsch Rob Fergus MDE 3DPC 3DV 210 4,052 0 09 Jun 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 387 43,619 0 01 May 2014
Auto-Encoding Variational Bayes Diederik P. Kingma Max Welling BDL 433 16,944 0 20 Dec 2013