ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.08402
  4. Cited By
LAION-5B: An open large-scale dataset for training next generation
  image-text models

LAION-5B: An open large-scale dataset for training next generation image-text models

16 October 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
Mehdi Cherti
Theo Coombes
Aarush Katta
Clayton Mullis
Mitchell Wortsman
P. Schramowski
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-5B: An open large-scale dataset for training next generation image-text models"

50 / 621 papers shown
Title
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Zehan Wang
Ke Lei
Chen Zhu
Jiawei Huang
Sashuai Zhou
...
Xize Cheng
Shengpeng Ji
Zhenhui Ye
Tao Jin
Zhou Zhao
29
0
0
15 May 2025
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
38
0
0
13 May 2025
Generative Pre-trained Autoregressive Diffusion Transformer
Generative Pre-trained Autoregressive Diffusion Transformer
Yuan Zhang
Jiacheng Jiang
Guoqing Ma
Zhiying Lu
Haoyang Huang
Jianlong Yuan
Nan Duan
VGen
43
1
0
12 May 2025
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models
Junhao Xia
Chaoyang Zhang
Yecheng Zhang
Chengyang Zhou
Zhichang Wang
Bochun Liu
Dongshuo Yin
DiffM
VGen
31
0
0
11 May 2025
Learning Graph Representation of Agent Diffusers
Learning Graph Representation of Agent Diffusers
Youcef Djenouri
Nassim Belmecheri
Tomasz Michalak
Jan Dubiñski
Ahmed Nabil Belbachir
Anis Yazidi
AI4CE
31
0
0
10 May 2025
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Karthik Reddy Kanjula
Surya Guthikonda
Nahid Alam
Shayekh Bin Islam
26
0
0
09 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
53
0
0
08 May 2025
FLAM: Frame-Wise Language-Audio Modeling
FLAM: Frame-Wise Language-Audio Modeling
Yusong Wu
Christos Tsirigotis
Ke Chen
Cheng-Zhi Anna Huang
Aaron C. Courville
Oriol Nieto
Prem Seetharaman
Justin Salamon
50
0
0
08 May 2025
Diffusion Model Quantization: A Review
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
45
0
0
08 May 2025
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
F. Khan
Jun Chen
Youssef Mohamed
Chun-Mei Feng
Mohamed Elhoseiny
VLM
33
0
0
08 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
44
0
0
08 May 2025
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation
Viacheslav Vasilev
V. Arkhipkin
Julia Agafonova
Tatiana Nikulina
Evelina Mironova
Alisa Shichanina
Nikolai Gerasimenko
Mikhail Shoytov
Denis Dimitrov
46
0
0
07 May 2025
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
L. Wang
Senmao Li
Fei Yang
Jianye Wang
Ziheng Zhang
Yong-Jin Liu
Y. Wang
Jian Yang
DiffM
61
0
0
06 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
68
0
0
06 May 2025
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu
Yuxin Peng
26
0
0
06 May 2025
Panoramic Out-of-Distribution Segmentation
Panoramic Out-of-Distribution Segmentation
Mengfei Duan
Kailun Yang
Y. Zhang
Yihong Cao
Fei Teng
Kai Luo
Jiaming Zhang
Zhiyong Li
Shutao Li
59
0
0
06 May 2025
Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation
Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation
Hengyuan Hu
Aniket Das
Dorsa Sadigh
Nima Anari
DiffM
26
0
0
06 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
79
1
0
05 May 2025
Incentivizing Inclusive Contributions in Model Sharing Markets
Incentivizing Inclusive Contributions in Model Sharing Markets
Enpei Zhang
Jingyi Chai
Rui Ye
Yanfeng Wang
Siheng Chen
TDI
FedML
144
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization
MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization
Chenghong Li
Hongjie Liao
Yihao Zhi
Xihe Yang
Zhengwentai Sun
Jiahao Chang
Shuguang Cui
Xiaoguang Han
3DH
57
0
0
03 May 2025
Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content
Haoyue Bai
Yiyou Sun
Wei Cheng
Haifeng Chen
AAML
51
0
0
02 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
65
1
0
01 May 2025
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
Yunhao Li
Sijing Wu
Wei Sun
Zhichao Zhang
Yucheng Zhu
Zicheng Zhang
Huiyu Duan
Xiongkuo Min
Guangtao Zhai
EGVM
90
0
0
30 Apr 2025
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Quentin Guimard
Moreno DÍncà
Massimiliano Mancini
Elisa Ricci
SSL
72
0
0
29 Apr 2025
YoChameleon: Personalized Vision and Language Generation
YoChameleon: Personalized Vision and Language Generation
Thao Nguyen
Krishna Kumar Singh
Jing Shi
Trung H. Bui
Yong Jae Lee
Yuheng Li
MLLM
82
0
0
29 Apr 2025
Erased but Not Forgotten: How Backdoors Compromise Concept Erasure
Erased but Not Forgotten: How Backdoors Compromise Concept Erasure
Jonas Henry Grebe
Tobias Braun
Marcus Rohrbach
Anna Rohrbach
AAML
85
0
0
29 Apr 2025
Boosting 3D Liver Shape Datasets with Diffusion Models and Implicit Neural Representations
Boosting 3D Liver Shape Datasets with Diffusion Models and Implicit Neural Representations
K. T. Nguyen
Francesca Tozzi
W. Willaert
J. Vankerschaver
Nikdokht Rashidian
W. D. Neve
DiffM
MedIm
118
0
0
28 Apr 2025
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
Valerie Zermatten
J. Castillo-Navarro
Pallavi Jain
D. Tuia
Diego Marcos
62
0
0
28 Apr 2025
SynergyAmodal: Deocclude Anything with Text Control
SynergyAmodal: Deocclude Anything with Text Control
Xinyang Li
Chengjie Yi
Jiawei Lai
Mingbao Lin
Yansong Qu
Shengchuan Zhang
Liujuan Cao
DiffM
73
0
0
28 Apr 2025
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
Gal Almog
Ariel Shamir
Ohad Fried
DiffM
63
0
0
26 Apr 2025
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
Cheng Chen
Daochang Liu
M. Shah
Chang Xu
64
1
0
25 Apr 2025
Backdoor Defense in Diffusion Models via Spatial Attention Unlearning
Backdoor Defense in Diffusion Models via Spatial Attention Unlearning
Abha Jha
Ashwath Vaithinathan Aravindan
Matthew Salaway
Atharva Sandeep Bhide
Duygu Nur Yaldiz
AAML
70
0
0
21 Apr 2025
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
C. Kim
Jihwan Moon
Sangwoo Moon
Heeseung Yun
Sihaeng Lee
Aniruddha Kembhavi
Soonyoung Lee
Gunhee Kim
Sangho Lee
Christopher Clark
31
0
0
21 Apr 2025
VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation
VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation
Mingxia Zhan
Li Zhang
Xiaomeng Chu
Beibei Wang
MDE
64
0
0
21 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
49
2
0
20 Apr 2025
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
Rahul Thapa
Andrew Li
Qingyang Wu
B. He
Yuki Sahashi
...
Angela Zhang
Ben Athiwaratkun
Shuaiwen Leon Song
David Ouyang
James Zou
LM&MA
49
0
0
19 Apr 2025
PipeWeaver: Addressing Data Dynamicity in Large Multimodal Model Training with Dynamic Interleaved Pipeline
PipeWeaver: Addressing Data Dynamicity in Large Multimodal Model Training with Dynamic Interleaved Pipeline
Zhenliang Xue
Hanpeng Hu
Xing Chen
Yimin Jiang
Yixin Song
Zeyu Mi
Yibo Zhu
Daxin Jiang
Yubin Xia
Haibo Chen
49
0
0
19 Apr 2025
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Lvmin Zhang
Maneesh Agrawala
DiffM
VGen
75
0
0
17 Apr 2025
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results
Xin Li
Kun Yuan
B. Li
Fengbin Guan
Yizhen Shao
...
Guohua Zhang
Z. Huang
Y. Deng
Qingmiao Jiang
Lu Chen
55
7
0
17 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Understanding Attention Mechanism in Video Diffusion Models
Understanding Attention Mechanism in Video Diffusion Models
Bingyan Liu
Chengyu Wang
Tongtong Su
Huan Ten
Jun Huang
K. Guo
Kui Jia
VGen
64
0
0
16 Apr 2025
Cobra: Efficient Line Art COlorization with BRoAder References
Cobra: Efficient Line Art COlorization with BRoAder References
Junhao Zhuang
Lingen Li
Xuan Ju
Zhaoyang Zhang
C. Yuan
Ying Shan
DiffM
67
0
0
16 Apr 2025
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
Tianjiao Jiang
Zhen Zhang
Anton van den Hengel
Javen Qinfeng Shi
62
0
0
14 Apr 2025
From Visual Explanations to Counterfactual Explanations with Latent Diffusion
From Visual Explanations to Counterfactual Explanations with Latent Diffusion
Tung Luu
Nam Le
Duc Le
Bac Le
DiffM
AAML
FAtt
50
0
0
12 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Zhengzhang Chen
Zongyu Lin
MLLM
VLM
MoE
204
2
0
10 Apr 2025
MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data
MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data
Paul Borne--Pons
Mikolaj Czerkawski
Rosalie Martin
Romain Rouffet
DiffM
19
2
0
09 Apr 2025
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
Jiazi Bu
Pengyang Ling
Yujie Zhou
Pan Zhang
Tong Wu
Xiaoyi Dong
Yuhang Zang
Y. Cao
Dahua Lin
Jiaqi Wang
21
0
0
08 Apr 2025
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition
Rupayan Mallick
Sibo Dong
Nataniel Ruiz
Sarah Adel Bargal
DiffM
49
0
0
08 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff
Erblina Purellku
Jakob Hackstein
Jonas Loos
Leo Pinetzki
Lorenz Hufe
AAML
28
0
0
07 Apr 2025
1234...111213
Next