ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.15323
  4. Cited By
SmallCap: Lightweight Image Captioning Prompted with Retrieval
  Augmentation
v1v2 (latest)

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation

30 September 2022
R. Ramos
Bruno Martins
Desmond Elliott
Yova Kementchedjhieva
    VLM
ArXiv (abs)PDFHTML

Papers citing "SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation"

31 / 31 papers shown
Title
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
180
0
0
03 Mar 2025
Altogether: Image Captioning via Re-aligning Alt-text
Altogether: Image Captioning via Re-aligning Alt-text
Hu Xu
Po-Yao (Bernie) Huang
Xiaoqing Ellen Tan
Ching-Feng Yeh
Jacob Kahn
...
Luke Zettlemoyer
Wen-tau Yih
Shang-Wen Li
Saining Xie
Christoph Feichtenhofer
DiffM
79
9
0
31 Dec 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming
Senthil Purushwalkam
Shrey Pandit
Zixuan Ke
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
HILM
223
24
0
30 Sep 2024
Linear Alignment of Vision-language Models for Image Captioning
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIPVLM
131
0
0
10 Jul 2023
Retrieval-augmented Image Captioning
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
78
29
0
16 Feb 2023
Retrieval-Augmented Transformer for Image Captioning
Retrieval-Augmented Transformer for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
75
58
0
26 Jul 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
137
557
0
27 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,602
0
29 Apr 2022
CaMEL: Mean Teacher Learning for Image Captioning
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViTVLM
65
28
0
21 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
555
4,409
0
28 Jan 2022
VL-Adapter: Parameter-Efficient Transfer Learning for
  Vision-and-Language Tasks
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLMVPVLM
112
356
0
13 Dec 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLMVLM
137
250
0
24 Nov 2021
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based
  Learning for Vision-Language Models
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
Woojeong Jin
Yu Cheng
Yelong Shen
Weizhu Chen
Xiang Ren
VLMVPVLMMLLM
106
137
0
16 Oct 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
136
799
0
24 Aug 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
183
788
0
25 Jun 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
447
1,127
0
17 Feb 2021
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
135
1,944
0
13 Apr 2020
Captioning Images Taken by People Who Are Blind
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
77
183
0
20 Feb 2020
Connecting Vision and Language with Localized Narratives
Connecting Vision and Language with Localized Narratives
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
96
251
0
06 Dec 2019
Clotho: An Audio Captioning Dataset
Clotho: An Audio Captioning Dataset
Konstantinos Drossos
Samuel Lipping
Tuomas Virtanen
101
394
0
21 Oct 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
101
555
0
06 Apr 2019
nocaps: novel object captioning at scale
nocaps: novel object captioning at scale
Harsh Agrawal
Karan Desai
Yufei Wang
Xinlei Chen
Rishabh Jain
Mark Johnson
Dhruv Batra
Devi Parikh
Stefan Lee
Peter Anderson
VLM
131
486
0
20 Dec 2018
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
151
2,151
0
14 Nov 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,221
0
25 Jul 2017
Billion-scale similarity search with GPUs
Billion-scale similarity search with GPUs
Jeff Johnson
Matthijs Douze
Hervé Jégou
257
3,737
0
28 Feb 2017
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
108
1,919
0
29 Jul 2016
TGIF: A New Dataset and Benchmark on Animated GIF Description
TGIF: A New Dataset and Benchmark on Animated GIF Description
Yuncheng Li
Yale Song
Liangliang Cao
Joel R. Tetreault
Larry Goldberg
A. Jaimes
Jiebo Luo
79
273
0
10 Apr 2016
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
221
2,493
0
01 Apr 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
348
10,079
0
10 Feb 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
144
5,591
0
07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
297
4,508
0
20 Nov 2014
1