v1v2 (latest)

Locality Alignment Improves Vision-Language Models

International Conference on Learning Representations (ICLR), 2024

14 October 2024

Papers citing "Locality Alignment Improves Vision-Language Models"

23 / 123 papers shown

Title
TextCaps: a Dataset for Image Captioning with Reading ComprehensionEuropean Conference on Computer Vision (ECCV), 2020 Oleksii Sidorov Ronghang Hu Marcus Rohrbach Amanpreet Singh 307 493 0 24 Mar 2020
A Simple Framework for Contrastive Learning of Visual RepresentationsInternational Conference on Machine Learning (ICML), 2020 Ting-Li Chen Simon Kornblith Mohammad Norouzi Geoffrey E. Hinton SSL 1.0K 21,811 0 13 Feb 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 1.5K 8,651 0 02 Oct 2019
LVIS: A Dataset for Large Vocabulary Instance SegmentationComputer Vision and Pattern Recognition (CVPR), 2019 Agrim Gupta Piotr Dollár Ross B. Girshick ISeg VLM 471 1,576 0 08 Aug 2019
OK-VQA: A Visual Question Answering Benchmark Requiring External KnowledgeComputer Vision and Pattern Recognition (CVPR), 2019 Kenneth Marino Mohammad Rastegari Ali Farhadi Roozbeh Mottaghi 490 1,336 0 31 May 2019
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable FeaturesIEEE International Conference on Computer Vision (ICCV), 2019 Sangdoo Yun Dongyoon Han Seong Joon Oh Sanghyuk Chun Junsuk Choe Y. Yoo OOD 1.4K 5,430 0 13 May 2019
Towards VQA Models That Can Read Amanpreet Singh Vivek Natarajan Meet Shah Yu Jiang Xinlei Chen Dhruv Batra Devi Parikh Marcus Rohrbach EgoV 505 1,632 0 18 Apr 2019
TallyQA: Answering Complex Counting Questions Manoj Acharya Kushal Kafle Christopher Kanan 188 158 0 29 Oct 2018
AutoAugment: Learning Augmentation Policies from Data E. D. Cubuk Barret Zoph Dandelion Mané Vijay Vasudevan Quoc V. Le 612 1,886 0 24 May 2018
Unsupervised Representation Learning by Predicting Image Rotations Spyros Gidaris Praveer Singh N. Komodakis OOD SSL DRL 731 3,481 0 21 Mar 2018
mixup: Beyond Empirical Risk MinimizationInternational Conference on Learning Representations (ICLR), 2017 Hongyi Zhang Moustapha Cissé Yann N. Dauphin David Lopez-Paz NoLa 544 10,936 0 25 Oct 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Yash Goyal Tejas Khot D. Summers-Stay Dhruv Batra Devi Parikh CoGe 904 3,723 0 02 Dec 2016
Semantic Understanding of Scenes through the ADE20K DatasetInternational Journal of Computer Vision (IJCV), 2016 Bolei Zhou Hang Zhao Xavier Puig Tete Xiao Sanja Fidler Adela Barriuso Antonio Torralba SSeg 635 2,123 0 18 Aug 2016
Modeling Context in Referring Expressions Licheng Yu Patrick Poirson Shan Yang Alexander C. Berg Tamara L. Berg 422 1,483 0 31 Jul 2016
Fully Convolutional Networks for Semantic Segmentation Evan Shelhamer Jonathan Long Trevor Darrell VOS SSeg 1.1K 40,321 0 20 May 2016
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles M. Noroozi Paolo Favaro SSL 593 3,138 0 30 Mar 2016
A Diagram Is Worth A Dozen Images Aniruddha Kembhavi M. Salvato Eric Kolve Minjoon Seo Hannaneh Hajishirzi Ali Farhadi 3DV 196 724 0 24 Mar 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Ranjay Krishna Yuke Zhu Oliver Groth Justin Johnson Kenji Hata ... Yannis Kalantidis Li Li David A. Shamma Michael S. Bernstein Fei-Fei Li 868 6,156 0 23 Feb 2016
VQA: Visual Question Answering Aishwarya Agrawal Jiasen Lu Stanislaw Antol Margaret Mitchell C. L. Zitnick Dhruv Batra Devi Parikh CoGe 890 6,014 0 03 May 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 721 21,991 0 09 Mar 2015
Deep Visual-Semantic Alignments for Generating Image DescriptionsComputer Vision and Pattern Recognition (CVPR), 2014 A. Karpathy Li Fei-Fei 454 5,830 0 07 Dec 2014
Microsoft COCO: Common Objects in ContextEuropean Conference on Computer Vision (ECCV), 2014 Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 8.1K 48,609 0 01 May 2014
Visualizing and Understanding Convolutional NetworksEuropean Conference on Computer Vision (ECCV), 2013 Matthew D. Zeiler Rob Fergus FAtt SSL 922 16,538 0 12 Nov 2013