ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.08402
  4. Cited By
LAION-5B: An open large-scale dataset for training next generation
  image-text models

LAION-5B: An open large-scale dataset for training next generation image-text models

16 October 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
Mehdi Cherti
Theo Coombes
Aarush Katta
Clayton Mullis
Mitchell Wortsman
P. Schramowski
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-5B: An open large-scale dataset for training next generation image-text models"

50 / 665 papers shown
Title
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
Hamed Damirchi
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Javen Qinfeng Shi
Stephen Gould
Anton Van Den Hengel
VLM
47
0
0
29 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
47
1
0
29 Nov 2023
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Xian Liu
Xiaohang Zhan
Jiaxiang Tang
Ying Shan
Gang Zeng
Dahua Lin
Xihui Liu
Ziwei Liu
3DGS
40
72
0
28 Nov 2023
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng
Biao Gong
Di Chen
Yujun Shen
Yu Liu
Jingren Zhou
DiffM
34
43
0
28 Nov 2023
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D
  Diffusion Priors
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Seungwoo Yoo
Kunho Kim
Vladimir G. Kim
Minhyuk Sung
DiffM
39
13
0
28 Nov 2023
IG Captioner: Information Gain Captioners are Strong Zero-shot
  Classifiers
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
Chenglin Yang
Siyuan Qiao
Yuan Cao
Yu Zhang
Tao Zhu
Alan Yuille
Jiahui Yu
VLM
18
3
0
27 Nov 2023
HawkI: Homography & Mutual Information Guidance for 3D-free Single Image
  to Aerial View
HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View
D. Kothandaraman
Dinesh Manocha
Ming C. Lin
Dinesh Manocha
DiffM
31
2
0
27 Nov 2023
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Weijia Wu
Zhuang Li
Yefei He
Mike Zheng Shou
Chunhua Shen
Lele Cheng
Yan Li
Tingting Gao
Di Zhang
VLM
141
24
0
24 Nov 2023
Posterior Distillation Sampling
Posterior Distillation Sampling
Juil Koo
Chanho Park
Minhyuk Sung
DiffM
27
27
0
23 Nov 2023
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
  Model
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang
Jian Tao
Jiafei Lyu
Chunjiang Ge
Jiaxin Chen
Qimai Li
Weihan Shen
Xiaolong Zhu
Xiu Li
EGVM
23
89
0
22 Nov 2023
Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to
  3D Prior with Progressive Learning
Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to 3D Prior with Progressive Learning
Kai Yu
Jinlin Liu
Mengyang Feng
Miaomiao Cui
Xuansong Xie
43
6
0
22 Nov 2023
Nepotistically Trained Generative-AI Models Collapse
Nepotistically Trained Generative-AI Models Collapse
Matyáš Boháček
Hany Farid
54
18
0
20 Nov 2023
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
  Tuning
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu-Gang Jiang
MLLM
VLM
38
94
0
13 Nov 2023
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
  Reconstruction Model
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Jiahao Li
Hao Tan
Kai Zhang
Zexiang Xu
Fujun Luan
Yinghao Xu
Yicong Hong
Kalyan Sunkavalli
Greg Shakhnarovich
Sai Bi
59
254
0
10 Nov 2023
PolyMaX: General Dense Prediction with Mask Transformer
PolyMaX: General Dense Prediction with Mask Transformer
Xuan S. Yang
Liangzhe Yuan
Kimberly Wilber
Astuti Sharma
Xiuye Gu
...
Stephanie Debats
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Liang-Chieh Chen
31
14
0
09 Nov 2023
Improved DDIM Sampling with Moment Matching Gaussian Mixtures
Improved DDIM Sampling with Moment Matching Gaussian Mixtures
Prasad Gabbur
DiffM
30
1
0
08 Nov 2023
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Chenfeng Xu
Huan Ling
Sanja Fidler
Or Litany
20
14
0
07 Nov 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
75
22
0
02 Nov 2023
Are Natural Domain Foundation Models Useful for Medical Image
  Classification?
Are Natural Domain Foundation Models Useful for Medical Image Classification?
Joana Palés Huix
Adithya Raju Ganeshan
Johan Fredin Haslum
Magnus P Soderberg
Christos Matsoukas
Kevin Smith
OOD
MedIm
VLM
24
30
0
30 Oct 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Morris Alper
Hadar Averbuch-Elor
46
10
0
25 Oct 2023
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Yixin Wu
Ning Yu
Michael Backes
Yun Shen
Yang Zhang
DiffM
59
8
0
25 Oct 2023
Online Detection of AI-Generated Images
Online Detection of AI-Generated Images
David C. Epstein
Ishan Jain
Oliver Wang
Richard Y. Zhang
35
53
0
23 Oct 2023
Leveraging Image-Text Similarity and Caption Modification for the
  DataComp Challenge: Filtering Track and BYOD Track
Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track
Shuhei Yokoo
Peifei Zhu
Yuchi Ishikawa
Mikihiro Tanaka
Masayoshi Kondo
Hirokatsu Kataoka
24
0
0
23 Oct 2023
Semantic and Expressive Variation in Image Captions Across Languages
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
61
3
0
22 Oct 2023
HyperHuman: Hyper-Realistic Human Generation with Latent Structural
  Diffusion
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
Xian Liu
Jian Ren
Aliaksandr Siarohin
Ivan Skorokhodov
Yanyu Li
Dahua Lin
Xihui Liu
Ziwei Liu
Sergey Tulyakov
32
57
0
12 Oct 2023
Bucks for Buckets (B4B): Active Defenses Against Stealing Encoders
Bucks for Buckets (B4B): Active Defenses Against Stealing Encoders
Jan Dubiñski
Stanislaw Pawlak
Franziska Boenisch
Tomasz Trzciñski
Adam Dziedzic
AAML
29
3
0
12 Oct 2023
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Rui Zhao
Yuchao Gu
Jay Zhangjie Wu
David Junhao Zhang
Jia-Wei Liu
Weijia Wu
Jussi Keppo
Mike Zheng Shou
DiffM
VGen
30
104
0
12 Oct 2023
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
Tianlong Li
Wenhao Liu
Changze Lv
Jianhan Xu
Cenyuan Zhang
Muling Wu
Muling Wu
Xiaoqing Zheng
Xuanjing Huang
CLIP
VLM
28
2
0
10 Oct 2023
FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained
  Diffusion Models and Monocular Depth Estimators
FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators
Haiping Wang
Yuan Liu
Bing Wang
Yujing Sun
Zhenchao Dong
Wenping Wang
Bisheng Yang
DiffM
38
11
0
05 Oct 2023
Delving into CLIP latent space for Video Anomaly Recognition
Delving into CLIP latent space for Video Anomaly Recognition
Luca Zanella
Benedetta Liberatori
Willi Menapace
Fabio Poiesi
Yiming Wang
Elisa Ricci
31
22
0
04 Oct 2023
AI-Generated Images as Data Source: The Dawn of Synthetic Era
AI-Generated Images as Data Source: The Dawn of Synthetic Era
Zuhao Yang
Fangneng Zhan
Kunhao Liu
Muyu Xu
Shijian Lu
EGVM
31
18
0
03 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation
  by decoupling object-attribute association
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
33
4
0
02 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large
  Multimodal Models with In-Context Learning
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
46
20
0
01 Oct 2023
Data Filtering Networks
Data Filtering Networks
Alex Fang
Albin Madappally Jose
Amit Jain
Ludwig Schmidt
Alexander Toshev
Vaishaal Shankar
CLIP
46
125
0
29 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
34
93
0
27 Sep 2023
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for
  Text-Based Image Editing
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing
Kai Wang
Fei Yang
Shiqi Yang
Muhammad Atif Butt
Joost van de Weijer
DiffM
39
51
0
27 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
80
225
0
26 Sep 2023
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM
  Animator
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
Hanzhuo Huang
Yufan Feng
Cheng Shi
Lan Xu
Jingyi Yu
Sibei Yang
DiffM
VGen
23
64
0
25 Sep 2023
Species196: A One-Million Semi-supervised Dataset for Fine-grained
  Species Recognition
Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition
W. He
Kai Han
Ying Nie
Chengcheng Wang
Yunhe Wang
VLM
48
6
0
25 Sep 2023
Zero-Shot Object Counting with Language-Vision Models
Zero-Shot Object Counting with Language-Vision Models
Jingyi Xu
Hieu M. Le
Dimitris Samaras
VLM
DiffM
35
4
0
22 Sep 2023
ContextRef: Evaluating Referenceless Metrics For Image Description
  Generation
ContextRef: Evaluating Referenceless Metrics For Image Description Generation
Elisa Kreiss
E. Zelikman
Christopher Potts
Nick Haber
29
5
0
21 Sep 2023
Dataset Factory: A Toolchain For Generative Computer Vision Datasets
Dataset Factory: A Toolchain For Generative Computer Vision Datasets
Daniel Kharitonov
Ryan Turner
16
1
0
20 Sep 2023
Distilling Adversarial Prompts from Safety Benchmarks: Report for the
  Adversarial Nibbler Challenge
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack
P. Schramowski
Kristian Kersting
AAML
EGVM
32
7
0
20 Sep 2023
Text-to-Image Models for Counterfactual Explanations: a Black-Box
  Approach
Text-to-Image Models for Counterfactual Explanations: a Black-Box Approach
Guillaume Jeanneret
Loïc Simon
Frédéric Jurie
DiffM
30
12
0
14 Sep 2023
ITI-GEN: Inclusive Text-to-Image Generation
ITI-GEN: Inclusive Text-to-Image Generation
Cheng Zhang
Xuanbai Chen
Siqi Chai
Chen Henry Wu
Dmitry Lagun
Thabo Beeler
Fernando de la Torre
VLM
38
52
0
11 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
49
117
0
07 Sep 2023
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Jiaxi Gu
Shicong Wang
Haoyu Zhao
Tianyi Lu
Xing Zhang
Zuxuan Wu
Songcen Xu
Wei Zhang
Yu-Gang Jiang
Hang Xu
DiffM
VGen
39
44
0
07 Sep 2023
Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge
  for Generic Image Representations
Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations
Nikolaos-Antonios Ypsilantis
Kaifeng Chen
Bingyi Cao
Mário Lipovský
Pelin Dogan-Schönberger
Grzegorz Makosa
Boris Bluntschli
Mojtaba Seyedhosseini
Ondrej Chum
André Araujo
SSL
26
13
0
04 Sep 2023
VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual
  Grounders
VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
Xuyang Liu
Siteng Huang
Yachen Kang
Honggang Chen
Donglin Wang
ObjD
35
12
0
03 Sep 2023
Previous
123...10111213149
Next