v1v2 (latest)

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

28 September 2020

Xiaowei Hu

Zicheng Liu

Papers citing "VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning"

37 / 37 papers shown

EVC-MF: End-to-end Video Captioning Network with Multi-scale Features

192

22 Oct 2024

Figuring out Figures: Using Textual References to Caption Scientific Figures

Stanley Cao

Kevin Liu

199

25 Jun 2024

Cycle-Consistency Learning for Captioning and Grounding

239

23 Dec 2023

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-trainingIEEE International Conference on Computer Vision (ICCV), 2023

Hang Xu

Jianhua Han

James T. Kwok

Shen Zhao

Wei Zhang

Xiaodan Liang

CLIP VLM

211

22 Aug 2023

R2H: Building Multimodal Navigation Helpers that Respond to Help RequestsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

258

23 May 2023

Efficient Image Captioning for Edge DevicesAAAI Conference on Artificial Intelligence (AAAI), 2022

Linlin Li

210

18 Dec 2022

Paraphrasing Is All You Need for Novel Object CaptioningNeural Information Processing Systems (NeurIPS), 2022

Louis-Philippe Morency

Yu-Chiang Frank Wang

185

25 Sep 2022

Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training ModelsACM Multimedia (ACM MM), 2022

Yi Zhang

Junyan Wang

Jitao Sang

283

03 Jul 2022

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMixInternational Conference on Machine Learning (ICML), 2022

Ran Cheng

Ping Luo

209

17 Jun 2022

Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneNeural Information Processing Systems (NeurIPS), 2022

...

296

152

15 Jun 2022

GIT: A Generative Image-to-text Transformer for Vision and Language

Zicheng Liu

613

714

27 May 2022

Housekeep: Tidying Virtual Households using Commonsense ReasoningEuropean Conference on Computer Vision (ECCV), 2022

416

22 May 2022

Cross-modal Representation Learning for Zero-shot Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2022

Zicheng Liu

152

03 May 2022

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External KnowledgeComputer Vision and Pattern Recognition (CVPR), 2022

244

28 Mar 2022

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

Lei Zhang

212

03 Mar 2022

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular AlignmentComputer Vision and Pattern Recognition (CVPR), 2022

Amanpreet Singh

158

01 Mar 2022

CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni RetrievalKnowledge Discovery and Data Mining (KDD), 2022

263

15 Feb 2022

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

259

110

09 Dec 2021

Injecting Semantic Concepts into End-to-End Image Captioning

Xiaowei Hu

Yezhou Yang

Zicheng Liu

ViT VLM

243

122

09 Dec 2021

SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2021

Zicheng Liu

353

303

25 Nov 2021

Scaling Up Vision-Language Pre-training for Image Captioning

Xiaowei Hu

Zicheng Liu

423

300

24 Nov 2021

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

Xiaowei Hu

Zicheng Liu

182

19 Nov 2021

SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionInternational Conference on Learning Representations (ICLR), 2021

877

921

24 Aug 2021

Is Object Detection Necessary for Human-Object Interaction Recognition?

Zicheng Liu

155

27 Jul 2021

Learning to Select: A Fully Attentive Approach for Novel Object CaptioningInternational Conference on Multimedia Retrieval (ICMR), 2021

Lorenzo Baraldi

165

02 Jun 2021

Maria: A Visual Experience Powered Conversational AgentAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

204

27 May 2021

Playing Lottery Tickets with Vision and LanguageAAAI Conference on Artificial Intelligence (AAAI), 2021

Zicheng Liu

321

23 Apr 2021

Compressing Visual-linguistic Model via Knowledge DistillationIEEE International Conference on Computer Vision (ICCV), 2021

Zhiyuan Fang

Jianfeng Wang

Xiaowei Hu

Lijuan Wang

Yezhou Yang

Zicheng Liu

VLM

283

116

05 Apr 2021

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual ConceptsComputer Vision and Pattern Recognition (CVPR), 2021

1.2K

1,370

17 Feb 2021

VinVL: Revisiting Visual Representations in Vision-Language Models

Pengchuan Zhang

Xiujun Li

Xiaowei Hu

Jianwei Yang

Lei Zhang

Lijuan Wang

Yejin Choi

Jianfeng Gao

ObjD VLM

523

169

02 Jan 2021

A Closer Look at the Robustness of Vision-and-Language Pre-trained Models

265

15 Dec 2020

MiniVLM: A Smaller and Faster Vision-Language Model

Xiaowei Hu

Zicheng Liu

265

13 Dec 2020

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

Lei Zhang

266

159

08 Dec 2020

Using Text to Teach Image Retrieval

134

19 Nov 2020

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

206

24 Oct 2020

Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging

473

06 Oct 2020

Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning ModelsInformation Fusion (Inf. Fusion), 2020

645

04 Jan 2020