ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.08402
  4. Cited By
LAION-5B: An open large-scale dataset for training next generation
  image-text models

LAION-5B: An open large-scale dataset for training next generation image-text models

16 October 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
Mehdi Cherti
Theo Coombes
Aarush Katta
Clayton Mullis
Mitchell Wortsman
P. Schramowski
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-5B: An open large-scale dataset for training next generation image-text models"

50 / 664 papers shown
Title
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
41
22
0
31 Aug 2023
Efficient Model Personalization in Federated Learning via
  Client-Specific Prompt Generation
Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation
Fu-En Yang
Chien-Yi Wang
Yu-Chiang Frank Wang
VLM
FedML
34
59
0
29 Aug 2023
Sparse3D: Distilling Multiview-Consistent Diffusion for Object
  Reconstruction from Sparse Views
Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views
Zi-Xin Zou
Weihao Cheng
Yan-Pei Cao
Shi-Sheng Huang
Ying Shan
Songiie Zhang
DiffM
36
23
0
27 Aug 2023
Manipulating Embeddings of Stable Diffusion Prompts
Manipulating Embeddings of Stable Diffusion Prompts
Niklas Deckers
Julia Peters
Martin Potthast
DiffM
40
9
0
23 Aug 2023
Backdooring Textual Inversion for Concept Censorship
Backdooring Textual Inversion for Concept Censorship
Yutong Wu
Jiehan Zhang
Florian Kerschbaum
Tianwei Zhang
DiffM
40
7
0
21 Aug 2023
UniDoc: A Universal Large Multimodal Model for Simultaneous Text
  Detection, Recognition, Spotting and Understanding
UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Hao Feng
Zijian Wang
Jingqun Tang
Jinghui Lu
Wen-gang Zhou
Houqiang Li
Can Huang
MLLM
VLM
45
47
0
19 Aug 2023
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Fulong Ye
Guangyi Liu
Xinya Wu
Ledell Yu Wu
VLM
42
25
0
19 Aug 2023
Exploring Transfer Learning in Medical Image Segmentation using
  Vision-Language Models
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models
K. Poudel
Manish Dhakal
Prasiddha Bhandari
Rabin Adhikari
Safal Thapaliya
Bishesh Khanal
VLM
30
17
0
15 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
Few-shot medical image classification with simple shape and texture text
  descriptors using vision-language models
Few-shot medical image classification with simple shape and texture text descriptors using vision-language models
Michal Byra
M. F. Rachmadi
Henrik Skibbe
VLM
38
6
0
08 Aug 2023
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen
  Convolutional CLIP
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VLM
CLIP
45
136
0
04 Aug 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for
  Complex Visual Reasoning Tasks
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Kousik Rajesh
Mrigank Raman
M. A. Karim
Pranit Chawla
VLM
25
2
0
31 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
118
0
25 Jul 2023
Towards a Visual-Language Foundation Model for Computational Pathology
Towards a Visual-Language Foundation Model for Computational Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Ivy Liang
...
Andrew Zhang
L. Le
Georg Gerber
Anil V. Parwani
Faisal Mahmood
VLM
MedIm
42
46
0
24 Jul 2023
Latent Code Augmentation Based on Stable Diffusion for Data-free
  Substitute Attacks
Latent Code Augmentation Based on Stable Diffusion for Data-free Substitute Attacks
Mingwen Shao
Lingzhuang Meng
Yuanjian Qiao
Lixu Zhang
W. Zuo
DiffM
29
0
0
24 Jul 2023
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
Michal Geyer
Omer Bar-Tal
Shai Bagon
Tali Dekel
VGen
DiffM
20
251
0
19 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
Image Captions are Natural Prompts for Text-to-Image Models
Image Captions are Natural Prompts for Text-to-Image Models
Shiye Lei
Hao Chen
Senyang Zhang
Bo Zhao
Dacheng Tao
VLM
32
19
0
17 Jul 2023
Zero-Shot Image Harmonization with Generative Model Prior
Zero-Shot Image Harmonization with Generative Model Prior
Jianqi Chen
Yilan Zhang
Zhengxia Zou
Keyan Chen
Z. Shi
DiffM
28
5
0
17 Jul 2023
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Hiroki Naganuma
Ryuichiro Hataya
Kotaro Yoshida
Ioannis Mitliagkas
OODD
95
1
0
17 Jul 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Kate Sanders
David Etter
Reno Kriz
Benjamin Van Durme
VGen
42
7
0
06 Jul 2023
Advancing Zero-Shot Digital Human Quality Assessment through
  Text-Prompted Evaluation
Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation
Zicheng Zhang
Wei Sun
Yingjie Zhou
Haoning Wu
Chunyi Li
Xiongkuo Min
Xiaohong Liu
Guangtao Zhai
Weisi Lin
19
34
0
06 Jul 2023
Multi-Modal Prototypes for Open-World Semantic Segmentation
Multi-Modal Prototypes for Open-World Semantic Segmentation
Yu-Hao Yang
Chaofan Ma
Chen Ju
Fei Zhang
Jiangchao Yao
Ya Zhang
Yanfeng Wang
VLM
42
9
0
05 Jul 2023
Collaborative Score Distillation for Consistent Visual Synthesis
Collaborative Score Distillation for Consistent Visual Synthesis
Subin Kim
Kyungmin Lee
June Suk Choi
Jongheon Jeong
Kihyuk Sohn
Jinwoo Shin
DiffM
29
21
0
04 Jul 2023
Solving Linear Inverse Problems Provably via Posterior Sampling with
  Latent Diffusion Models
Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models
Litu Rout
Negin Raoof
Giannis Daras
C. Caramanis
A. Dimakis
Sanjay Shakkottai
DiffM
44
93
0
02 Jul 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
Cheng Chen
Yaochu Jin
Lingjuan Lyu
AIFin
AI4CE
99
85
0
27 Jun 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun
  resolution
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
25
39
0
21 Jun 2023
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D
  Generation
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Yukun Huang
Jianan Wang
Yukai Shi
Zhengjun Zha
Xianbiao Qi
Lei Zhang
41
63
0
21 Jun 2023
Exploring the Effectiveness of Dataset Synthesis: An application of
  Apple Detection in Orchards
Exploring the Effectiveness of Dataset Synthesis: An application of Apple Detection in Orchards
A. V. Meekeren
Maya Aghaei
K. Dijkstra
DiffM
23
1
0
20 Jun 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
141
114
0
20 Jun 2023
Aligning Synthetic Medical Images with Clinical Knowledge using Human
  Feedback
Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback
Shenghuan Sun
Gregory M. Goldgof
A. Butte
Ahmed Alaa
MedIm
24
12
0
16 Jun 2023
Human Preference Score v2: A Solid Benchmark for Evaluating Human
  Preferences of Text-to-Image Synthesis
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Xiaoshi Wu
Yiming Hao
Keqiang Sun
Yixiong Chen
Feng Zhu
Rui Zhao
Hongsheng Li
46
253
0
15 Jun 2023
Training-free Diffusion Model Adaptation for Variable-Sized
  Text-to-Image Synthesis
Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
Zhiyu Jin
Xuli Shen
Bin Li
Xiangyang Xue
24
36
0
14 Jun 2023
GeneCIS: A Benchmark for General Conditional Image Similarity
GeneCIS: A Benchmark for General Conditional Image Similarity
S. Vaze
Nicolas Carion
Ishan Misra
VLM
DiffM
34
26
0
13 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Zhongang Qi
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
34
6
0
12 Jun 2023
Transferring Foundation Models for Generalizable Robotic Manipulation
Transferring Foundation Models for Generalizable Robotic Manipulation
Jiange Yang
Wenhui Tan
Chuhao Jin
Keling Yao
Bei Liu
Jianlong Fu
Ruihua Song
Gangshan Wu
Limin Wang
LM&Ro
47
6
0
09 Jun 2023
Improving neural network representations using human similarity
  judgments
Improving neural network representations using human similarity judgments
Lukas Muttenthaler
Lorenz Linhardt
Jonas Dippel
Robert A. Vandermeulen
Katherine L. Hermann
Andrew Kyle Lampinen
Simon Kornblith
42
31
0
07 Jun 2023
LRVS-Fashion: Extending Visual Search with Referring Instructions
LRVS-Fashion: Extending Visual Search with Referring Instructions
Simon Lepage
Jérémie Mary
David Picard
25
1
0
05 Jun 2023
Understanding and Mitigating Copying in Diffusion Models
Understanding and Mitigating Copying in Diffusion Models
Gowthami Somepalli
Vasu Singla
Micah Goldblum
Jonas Geiping
Tom Goldstein
DiffM
20
125
0
31 May 2023
Controllable Text-to-Image Generation with GPT-4
Controllable Text-to-Image Generation with GPT-4
Tianjun Zhang
Yi Zhang
Vibhav Vineet
Neel Joshi
Xin Wang
DiffM
30
42
0
29 May 2023
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning
  and Diffusion Priors
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
Paul S. Scotti
Atmadeep Banerjee
J. Goode
Stepan Shabalin
A. Nguyen
...
Nathalie Verlinde
Elad Yundler
David Weisberg
K. A. Norman
Tanishq Mathew Abraham
DiffM
45
108
0
29 May 2023
Gen-L-Video: Multi-Text to Long Video Generation via Temporal
  Co-Denoising
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
Fu Lee Wang
Wenshuo Chen
Guanglu Song
Han-Jia Ye
Yu Liu
Hongsheng Li
VGen
DiffM
50
89
0
29 May 2023
Mitigating Inappropriateness in Image Generation: Can there be Value in
  Reflecting the World's Ugliness?
Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness?
Manuel Brack
Felix Friedrich
P. Schramowski
Kristian Kersting
EGVM
18
13
0
28 May 2023
UDPM: Upsampling Diffusion Probabilistic Models
UDPM: Upsampling Diffusion Probabilistic Models
Shady Abu Hussein
Raja Giryes
DiffM
38
1
0
25 May 2023
Training on Thin Air: Improve Image Classification with Generated Data
Training on Thin Air: Improve Image Classification with Generated Data
Yongchao Zhou
Hshmat Sahak
Jimmy Ba
DiffM
19
43
0
24 May 2023
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal
  Image Generation
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Marco Bellagente
Manuel Brack
H. Teufel
Felix Friedrich
Bjorn Deiseroth
...
Koen Oostermeijer
Andres Felipe Cruz Salinas
P. Schramowski
Kristian Kersting
Samuel Weinbach
45
16
0
24 May 2023
In-Context Impersonation Reveals Large Language Models' Strengths and
  Biases
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Leonard Salewski
Stephan Alaniz
Isabel Rio-Torto
Eric Schulz
Zeynep Akata
44
151
0
24 May 2023
Text encoders bottleneck compositionality in contrastive vision-language
  models
Text encoders bottleneck compositionality in contrastive vision-language models
Amita Kamath
Jack Hessel
Kai-Wei Chang
CoGe
CLIP
VLM
30
19
0
24 May 2023
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic
  Correspondence
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
Grace Luo
Lisa Dunlap
Dong Huk Park
Aleksander Holynski
Trevor Darrell
42
119
0
23 May 2023
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained
  Vision-Language Model
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
CLIP
VLM
23
25
0
23 May 2023
Previous
123...1011121314
Next