ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.10407
  4. Cited By
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for
  Image Captioning

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

20 February 2021
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
    VLM
ArXivPDFHTML

Papers citing "VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning"

50 / 76 papers shown
Title
KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
Zhendong Mi
Qitao Tan
Xiaodong Yu
Zining Zhu
Geng Yuan
Shaoyi Huang
167
0
0
24 May 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
152
0
0
03 Apr 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Min Zhang
LM&MA
AILaw
190
171
0
28 Jan 2025
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Yuxuan Wan
Chaozheng Wang
Yi Dong
Wenxuan Wang
Shuqing Li
Yintong Huo
Michael R. Lyu
3DV
91
11
0
24 Jun 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
145
302
0
16 May 2024
StoryGPT-V: Large Language Models as Consistent Story Visualizers
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen
Mohamed Elhoseiny
VLM
147
11
0
04 Dec 2023
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
102
36
0
01 Jun 2022
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
56
121
0
30 Mar 2021
Generating Radiology Reports via Memory-driven Transformer
Generating Radiology Reports via Memory-driven Transformer
Zhihong Chen
Yan Song
Tsung-Hui Chang
Xiang Wan
MedIm
58
477
0
30 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
604
41,736
0
28 May 2020
Show, Describe and Conclude: On Exploiting the Structure Information of
  Chest X-Ray Reports
Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-Ray Reports
Baoyu Jing
Zeya Wang
Eric Xing
43
142
0
26 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
88
1,934
0
13 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
59
126
0
01 Apr 2020
X-Linear Attention Networks for Image Captioning
X-Linear Attention Networks for Image Captioning
Yingwei Pan
Ting Yao
Yehao Li
Tao Mei
92
510
0
31 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with
  Abstract Scene Graphs
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
83
216
0
01 Mar 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised
  Image-Text Data
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
76
261
0
22 Jan 2020
Meshed-Memory Transformer for Image Captioning
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
59
874
0
17 Dec 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
209
10,792
0
29 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
367
20,053
0
23 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
318
6,441
0
26 Sep 2019
Hierarchy Parsing for Image Captioning
Hierarchy Parsing for Image Captioning
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
VLM
54
165
0
09 Sep 2019
Image Captioning with Very Scarce Supervised Data: Adversarial
  Semi-Supervised Learning Approach
Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach
Dong-Jin Kim
Jinsoo Choi
Tae-Hyun Oh
In So Kweon
SSL
VLM
47
56
0
05 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
142
1,661
0
22 Aug 2019
Improving Captioning for Low-Resource Languages by Cycle Consistency
Improving Captioning for Low-Resource Languages by Cycle Consistency
Yike Wu
Shiwan Zhao
Jia Chen
Ying Zhang
Xiaojie Yuan
Zhong Su
42
8
0
21 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
227
2,474
0
20 Aug 2019
Attention on Attention for Image Captioning
Attention on Attention for Image Captioning
Lun Huang
Wenmin Wang
Jie Chen
Xiao-Yong Wei
56
829
0
19 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
217
3,667
0
06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
516
24,351
0
26 Jul 2019
Hello, It's GPT-2 -- How Can I Help You? Towards the Use of Pretrained
  Language Models for Task-Oriented Dialogue Systems
Hello, It's GPT-2 -- How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems
Paweł Budzianowski
Ivan Vulić
62
310
0
12 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
215
8,415
0
19 Jun 2019
Unified Language Model Pre-training for Natural Language Understanding
  and Generation
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu Wang
Jianfeng Gao
M. Zhou
H. Hon
ELM
AI4CE
186
1,554
0
08 May 2019
Pointing Novel Objects in Image Captioning
Pointing Novel Objects in Image Captioning
Yehao Li
Ting Yao
Yingwei Pan
Hongyang Chao
Tao Mei
66
69
0
25 Apr 2019
Linguistic Knowledge and Transferability of Contextual Representations
Linguistic Knowledge and Transferability of Contextual Representations
Nelson F. Liu
Matt Gardner
Yonatan Belinkov
Matthew E. Peters
Noah A. Smith
113
730
0
21 Mar 2019
PadChest: A large chest x-ray image dataset with multi-label annotated
  reports
PadChest: A large chest x-ray image dataset with multi-label annotated reports
A. Bustos
A. Pertusa
J. M. Salinas
M. Iglesia-Vayá
LM&MA
66
612
0
22 Jan 2019
nocaps: novel object captioning at scale
nocaps: novel object captioning at scale
Harsh Agrawal
Karan Desai
Yufei Wang
Xinlei Chen
Rishabh Jain
Mark Johnson
Dhruv Batra
Devi Parikh
Stefan Lee
Peter Anderson
VLM
102
476
0
20 Dec 2018
Auto-Encoding Scene Graphs for Image Captioning
Auto-Encoding Scene Graphs for Image Captioning
Xu Yang
Kaihua Tang
Hanwang Zhang
Jianfei Cai
140
698
0
06 Dec 2018
Unsupervised Image Captioning
Unsupervised Image Captioning
Yang Feng
Lin Ma
Wei Liu
Jiebo Luo
VLM
SSL
62
202
0
27 Nov 2018
Show, Control and Tell: A Framework for Generating Controllable and
  Grounded Captions
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
DiffM
65
175
0
26 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.4K
94,511
0
11 Oct 2018
Exploring Visual Relationship for Image Captioning
Exploring Visual Relationship for Image Captioning
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
74
831
0
19 Sep 2018
Object Hallucination in Image Captioning
Object Hallucination in Image Captioning
Anna Rohrbach
Lisa Anne Hendricks
Kaylee Burns
Trevor Darrell
Kate Saenko
151
424
0
06 Sep 2018
CNN+CNN: Convolutional Decoders for Image Captioning
CNN+CNN: Convolutional Decoders for Image Captioning
Qingzhong Wang
Antoni B. Chan
VLM
57
86
0
23 May 2018
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report
  Generation
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
Yuan Li
Xiaodan Liang
Zhiting Hu
Eric Xing
MedIm
40
333
0
21 May 2018
Neural Baby Talk
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
228
434
0
27 Mar 2018
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
74
1,151
0
21 Mar 2018
Unpaired Image Captioning by Language Pivoting
Unpaired Image Captioning by Language Pivoting
Jiuxiang Gu
Shafiq Joty
Jianfei Cai
G. Wang
59
83
0
14 Mar 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
182
11,542
0
15 Feb 2018
On the Automatic Generation of Medical Imaging Reports
On the Automatic Generation of Medical Imaging Reports
Baoyu Jing
P. Xie
Eric Xing
MedIm
59
509
0
22 Nov 2017
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
126
2,132
0
14 Nov 2017
Evaluating Visual Conversational Agents via Cooperative Human-AI Games
Evaluating Visual Conversational Agents via Cooperative Human-AI Games
Prithvijit Chattopadhyay
Deshraj Yadav
Viraj Prabhu
Arjun Chandrasekaran
Abhishek Das
Stefan Lee
Dhruv Batra
Devi Parikh
51
79
0
17 Aug 2017
12
Next