ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07750
  4. Cited By
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and
  Image Embeddings

Synth2^22: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

12 March 2024
Sahand Sharifzadeh
Christos Kaplanis
Shreya Pathak
D. Kumaran
Anastasija Ilić
Jovana Mitrović
Charles Blundell
Andrea Banino
    VLM
ArXivPDFHTML

Papers citing "Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings"

27 / 27 papers shown
Title
Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?
Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?
Che Liu
Zhongwei Wan
Haozhe Wang
Yinda Chen
T. Qaiser
Chen Jin
Fariba Yousefi
Nikolay Burlutskiy
Rossella Arcucci
VLM
SyDa
LM&MA
MedIm
78
2
0
17 Oct 2024
Is Model Collapse Inevitable? Breaking the Curse of Recursion by
  Accumulating Real and Synthetic Data
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
Matthias Gerstgrasser
Rylan Schaeffer
Apratim Dey
Rafael Rafailov
Henry Sleight
...
Andrey Gromov
Daniel A. Roberts
Diyi Yang
D. Donoho
Oluwasanmi Koyejo
62
61
0
01 Apr 2024
Synthetic Data from Diffusion Models Improves ImageNet Classification
Synthetic Data from Diffusion Models Improves ImageNet Classification
Shekoofeh Azizi
Simon Kornblith
Chitwan Saharia
Mohammad Norouzi
David J. Fleet
VLM
DiffM
69
304
0
17 Apr 2023
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
Paola Cascante-Bonilla
Khaled Shehada
James Smith
Sivan Doveh
Donghyun Kim
...
Gül Varol
A. Oliva
Vicente Ordonez
Rogerio Feris
Leonid Karlinsky
VLM
SyDa
56
41
0
30 Mar 2023
Pretrained Diffusion Models for Unified Human Motion Synthesis
Pretrained Diffusion Models for Unified Human Motion Synthesis
Jianxin Ma
Shuai Bai
Chang Zhou
DiffM
VGen
AI4CE
40
31
0
06 Dec 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
48
694
0
14 Sep 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
228
3,458
0
29 Apr 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
98
1,894
0
29 Mar 2022
Kubric: A scalable dataset generator
Kubric: A scalable dataset generator
Klaus Greff
Francois Belletti
Lucas Beyer
Carl Doersch
Yilun Du
...
Ziyu Wang
Tianhao Wu
K. M. Yi
Fangcheng Zhong
Andrea Tagliasacchi
60
255
0
07 Mar 2022
BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations
BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations
Daiqing Li
Huan Ling
Seung Wook Kim
Karsten Kreis
Adela Barriuso
Sanja Fidler
Antonio Torralba
84
106
0
12 Jan 2022
Label-Efficient Semantic Segmentation with Diffusion Models
Label-Efficient Semantic Segmentation with Diffusion Models
Dmitry Baranchuk
Ivan Rubachev
A. Voynov
Valentin Khrulkov
Artem Babenko
DiffM
VLM
201
526
0
06 Dec 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLM
VLM
85
247
0
24 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
145
1,398
0
03 Nov 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
91
789
0
24 Aug 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
111
762
0
25 Jun 2021
ImageNet-21K Pretraining for the Masses
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
263
692
0
22 Apr 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
115
999
0
04 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
356
1,103
0
17 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
244
514
0
11 Feb 2021
Classification by Attention: Scene Graph Classification with Prior
  Knowledge
Classification by Attention: Scene Graph Classification with Prior Knowledge
Sahand Sharifzadeh
Sina Moayed Baharlou
Volker Tresp
OCL
30
50
0
19 Nov 2020
Bootstrap your own latent: A new approach to self-supervised Learning
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
231
6,718
0
13 Jun 2020
Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Jia Zheng
Junfei Zhang
Jing Li
Rui Tang
Shenghua Gao
Zihan Zhou
3DV
42
268
0
01 Aug 2019
Learning Semantic Segmentation from Synthetic Data: A Geometrically
  Guided Input-Output Adaptation Approach
Learning Semantic Segmentation from Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach
Yuhua Chen
Wen Li
Xiaoran Chen
Luc Van Gool
45
248
0
12 Dec 2018
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
276
3,187
0
02 Dec 2016
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
142
2,033
0
19 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
132
2,461
0
01 Apr 2015
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
187
4,451
0
20 Nov 2014
1