ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.13682
  4. Cited By
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
v1v2 (latest)

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

28 September 2020
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
Guang Dai
Jianfeng Gao
Zicheng Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning"

37 / 37 papers shown
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
192
0
0
22 Oct 2024
Figuring out Figures: Using Textual References to Caption Scientific
  Figures
Figuring out Figures: Using Textual References to Caption Scientific Figures
Stanley Cao
Kevin Liu
199
0
0
25 Jun 2024
Cycle-Consistency Learning for Captioning and Grounding
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
239
13
0
23 Dec 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive
  Language-Image Pre-training
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-trainingIEEE International Conference on Computer Vision (ICCV), 2023
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIPVLM
211
3
0
22 Aug 2023
R2H: Building Multimodal Navigation Helpers that Respond to Help
  Requests
R2H: Building Multimodal Navigation Helpers that Respond to Help RequestsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yue Fan
Jing Gu
Kaizhi Zheng
Xin Eric Wang
258
6
0
23 May 2023
Efficient Image Captioning for Edge Devices
Efficient Image Captioning for Edge DevicesAAAI Conference on Artificial Intelligence (AAAI), 2022
Ning Wang
Jiangrong Xie
Hangzai Luo
Qinglin Cheng
Jihao Wu
Mingbo Jia
Linlin Li
VLMCLIP
210
39
0
18 Dec 2022
Paraphrasing Is All You Need for Novel Object Captioning
Paraphrasing Is All You Need for Novel Object CaptioningNeural Information Processing Systems (NeurIPS), 2022
Cheng Yang
Yifan Hao
Wanshu Fan
Ruslan Salakhutdinov
Louis-Philippe Morency
Yu-Chiang Frank Wang
185
6
0
25 Sep 2022
Counterfactually Measuring and Eliminating Social Bias in
  Vision-Language Pre-training Models
Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training ModelsACM Multimedia (ACM MM), 2022
Yi Zhang
Junyan Wang
Jitao Sang
283
33
0
03 Jul 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMixInternational Conference on Machine Learning (ICML), 2022
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
209
54
0
17 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneNeural Information Processing Systems (NeurIPS), 2022
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLMObjD
296
152
0
15 Jun 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
613
714
0
27 May 2022
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Housekeep: Tidying Virtual Households using Commonsense ReasoningEuropean Conference on Computer Vision (ECCV), 2022
Yash Kant
Arun Ramachandran
Sriram Yenamandra
Igor Gilitschenski
Dhruv Batra
Andrew Szot
Harsh Agrawal
LM&RoLRM
416
85
0
22 May 2022
Cross-modal Representation Learning for Zero-shot Action Recognition
Cross-modal Representation Learning for Zero-shot Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2022
Chung-Ching Lin
Kevin Qinghong Lin
Linjie Li
Lijuan Wang
Zicheng Liu
ViT
152
30
0
03 May 2022
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External
  Knowledge
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External KnowledgeComputer Vision and Pattern Recognition (CVPR), 2022
D. Vo
Hong Chen
Akihiro Sugimoto
Hideki Nakayama
244
19
0
28 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large
  Models
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TSVLM
212
41
0
03 Mar 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based
  Multi-Granular Alignment
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular AlignmentComputer Vision and Pattern Recognition (CVPR), 2022
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
158
35
0
01 Mar 2022
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with
  Omni Retrieval
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni RetrievalKnowledge Discovery and Data Mining (KDD), 2022
Licheng Yu
Jun Chen
Animesh Sinha
Mengjiao MJ Wang
Hugo Chen
Tamara L. Berg
Ning Zhang
VLM
263
44
0
15 Feb 2022
MAGMA -- Multimodal Augmentation of Generative Models through
  Adapter-based Finetuning
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
C. Eichenberg
Sid Black
Samuel Weinbach
Letitia Parcalabescu
Anette Frank
MLLMVLM
259
110
0
09 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViTVLM
243
122
0
09 Dec 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2021
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
353
303
0
25 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLMVLM
423
300
0
24 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
182
63
0
19 Nov 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionInternational Conference on Learning Representations (ICLR), 2021
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
877
921
0
24 Aug 2021
Is Object Detection Necessary for Human-Object Interaction Recognition?
Is Object Detection Necessary for Human-Object Interaction Recognition?
Ying Jin
Yinpeng Chen
Lijuan Wang
Jianfeng Wang
Pei Yu
Zicheng Liu
Lei Li
155
7
0
27 Jul 2021
Learning to Select: A Fully Attentive Approach for Novel Object
  Captioning
Learning to Select: A Fully Attentive Approach for Novel Object CaptioningInternational Conference on Multimedia Retrieval (ICMR), 2021
Marco Cagrandi
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
165
9
0
02 Jun 2021
Maria: A Visual Experience Powered Conversational Agent
Maria: A Visual Experience Powered Conversational AgentAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Zujie Liang
Huang Hu
Can Xu
Chongyang Tao
Xiubo Geng
Yining Chen
Fan Liang
Daxin Jiang
204
33
0
27 May 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and LanguageAAAI Conference on Artificial Intelligence (AAAI), 2021
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
321
62
0
23 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Compressing Visual-linguistic Model via Knowledge DistillationIEEE International Conference on Computer Vision (ICCV), 2021
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
283
116
0
05 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual ConceptsComputer Vision and Pattern Recognition (CVPR), 2021
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
1.2K
1,370
0
17 Feb 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjDVLM
523
169
0
02 Jan 2021
A Closer Look at the Robustness of Vision-and-Language Pre-trained
  Models
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Linjie Li
Zhe Gan
Jingjing Liu
VLM
265
50
0
15 Dec 2020
MiniVLM: A Smaller and Faster Vision-Language Model
MiniVLM: A Smaller and Faster Vision-Language Model
Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
Guang Dai
Jianfeng Gao
Zicheng Liu
VLMMLLM
265
71
0
13 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
266
159
0
08 Dec 2020
Using Text to Teach Image Retrieval
Using Text to Teach Image Retrieval
Haoyu Dong
Ze Wang
Qiang Qiu
Guillermo Sapiro
3DV
134
5
0
19 Nov 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images
  and Captions
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSLVLM
206
12
0
24 Oct 2020
Contrastive Cross-Modal Pre-Training: A General Strategy for Small
  Sample Medical Imaging
Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging
G. Liang
Connor Greenwell
Yu Zhang
Xiaoqin Wang
Ramakanth Kavuluru
Nathan Jacobs
473
26
0
06 Oct 2020
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning
  Models
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning ModelsInformation Fusion (Inf. Fusion), 2020
Jiamei Sun
Sebastian Lapuschkin
Wojciech Samek
Alexander Binder
FAtt
645
36
0
04 Jan 2020
1
Page 1 of 1