ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02114
  4. Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"

50 / 1,100 papers shown
Title
MLLMs-Augmented Visual-Language Representation Learning
MLLMs-Augmented Visual-Language Representation Learning
Yanqing Liu
Kai Wang
Wenqi Shao
Ping Luo
Yu Qiao
Mike Zheng Shou
Kaipeng Zhang
Yang You
VLM
29
11
0
30 Nov 2023
Merlin:Empowering Multimodal LLMs with Foresight Minds
Merlin:Empowering Multimodal LLMs with Foresight Minds
En Yu
Liang Zhao
Yana Wei
Jinrong Yang
Dongming Wu
...
Haoran Wei
Tiancai Wang
Zheng Ge
Xiangyu Zhang
Wenbing Tao
LRM
18
25
0
30 Nov 2023
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language
  Understanding
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng
Sicheng Xie
Zuyao You
Shiyi Lan
Zuxuan Wu
VLM
CoGe
MLLM
38
18
0
30 Nov 2023
Meta Co-Training: Two Views are Better than One
Meta Co-Training: Two Views are Better than One
Jay C. Rothenberger
Dimitrios I. Diochnos
VLM
50
2
0
29 Nov 2023
Fair Text-to-Image Diffusion via Fair Mapping
Fair Text-to-Image Diffusion via Fair Mapping
Jia Li
Lijie Hu
Jingfeng Zhang
Tianhang Zheng
Hua Zhang
Di Wang
54
14
0
29 Nov 2023
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang
Yuehuai Liu
Yu-Wing Tai
Chi-Keung Tang
DiffM
38
5
0
29 Nov 2023
Explaining CLIP's performance disparities on data from blind/low vision
  users
Explaining CLIP's performance disparities on data from blind/low vision users
Daniela Massiceti
Camilla Longden
Agnieszka Slowik
Samuel Wills
Martin Grayson
C. Morrison
VLM
29
9
0
29 Nov 2023
SoUnD Framework: Analyzing (So)cial Representation in (Un)structured
  (D)ata
SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata
Mark Díaz
Sunipa Dev
Emily Reif
Remi Denton
Vinodkumar Prabhakaran
33
3
0
28 Nov 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced
  Training
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
40
44
0
28 Nov 2023
Large Language Models Meet Computer Vision: A Brief Survey
Large Language Models Meet Computer Vision: A Brief Survey
Raby Hamadi
LM&MA
29
4
0
28 Nov 2023
Text-Driven Image Editing via Learnable Regions
Text-Driven Image Editing via Learnable Regions
Yuanze Lin
Yi-Wen Chen
Yi-Hsuan Tsai
Lu Jiang
Ming-Hsuan Yang
DiffM
34
16
0
28 Nov 2023
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi
Tobia Poppi
Federico Cocchi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
27
9
0
27 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
  Video-based Large Language Models
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
27
58
0
27 Nov 2023
MagicAnimate: Temporally Consistent Human Image Animation using
  Diffusion Model
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Zhongcong Xu
Jianfeng Zhang
Jun Hao Liew
Hanshu Yan
Jia-Wei Liu
Chenxu Zhang
Jiashi Feng
Mike Zheng Shou
VGen
DiffM
39
186
0
27 Nov 2023
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage
  and Sharing in LLMs
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
Yunxin Li
Baotian Hu
Wei Wang
Xiaochun Cao
Min Zhang
29
4
0
27 Nov 2023
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Chaofeng Chen
Annan Wang
Haoning Wu
Liang Liao
Wenxiu Sun
Qiong Yan
Weisi Lin
36
10
0
27 Nov 2023
Fully Authentic Visual Question Answering Dataset from Online
  Communities
Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen
Mengchen Liu
Noel Codella
Yunsheng Li
Lu Yuan
Danna Gurari
49
5
0
27 Nov 2023
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja
M. S. Danish
Muzammal Naseer
Abhijit Das
Salman Khan
Fahad Shahbaz Khan
28
138
0
24 Nov 2023
Inferring Latent Class Statistics from Text for Robust Visual Few-Shot
  Learning
Inferring Latent Class Statistics from Text for Robust Visual Few-Shot Learning
Yassir Bendou
Vincent Gripon
Bastien Pasdeloup
G. Lioi
Lukas Mauch
Fabien Cardinaux
G. B. Hacene
34
0
0
24 Nov 2023
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
Yufei Wang
Wenhan Yang
Xinyuan Chen
Yaohui Wang
Lanqing Guo
Lap-Pui Chau
Ziwei Liu
Yu Qiao
Alex C. Kot
Bihan Wen
DiffM
86
100
0
23 Nov 2023
Posterior Distillation Sampling
Posterior Distillation Sampling
Juil Koo
Chanho Park
Minhyuk Sung
DiffM
29
27
0
23 Nov 2023
Studying Artist Sentiments around AI-generated Artwork
Studying Artist Sentiments around AI-generated Artwork
Safinah Ali
C. Breazeal
47
4
0
22 Nov 2023
Towards Improving Document Understanding: An Exploration on
  Text-Grounding via MLLMs
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
66
19
0
22 Nov 2023
Stable Unlearnable Example: Enhancing the Robustness of Unlearnable
  Examples via Stable Error-Minimizing Noise
Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise
Yixin Liu
Kaidi Xu
Xun Chen
Lichao Sun
35
7
0
22 Nov 2023
DMLR: Data-centric Machine Learning Research -- Past, Present and Future
DMLR: Data-centric Machine Learning Research -- Past, Present and Future
Luis Oala
M. Maskey
Lilith Bat-Leah
Alicia Parrish
Nezihe Merve Gürel
...
Lora Aroyo
Ce Zhang
Joaquin Vanschoren
Isabelle Guyon
Peter Mattson
AI4CE
46
11
0
21 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
MLLM
VLM
81
590
0
21 Nov 2023
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via
  Blender-Oriented GPT Planning
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Jiaxi Lv
Yi Huang
Mingfu Yan
Jiancheng Huang
Jianzhuang Liu
Yifan Liu
Yafei Wen
Xiaoxin Chen
Shifeng Chen
VGen
DiffM
32
23
0
21 Nov 2023
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with
  Identity-aware Diffusion
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
Di Chang
Yichun Shi
Quankai Gao
Jessica Fu
Hongyi Xu
Guoxian Song
Qing Yan
Yizhe Zhu
Xiao Yang
Mohammad Soleymani
DiffM
VGen
22
50
0
18 Nov 2023
Make Pixels Dance: High-Dynamic Video Generation
Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng
Guoqiang Wei
Jiani Zheng
Jiaxin Zou
Yang Wei
Yuchen Zhang
Hang Li
DiffM
VGen
21
93
0
18 Nov 2023
DRESS: Instructing Large Vision-Language Models to Align and Interact
  with Humans via Natural Language Feedback
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
40
59
0
16 Nov 2023
Trustworthy Large Models in Vision: A Survey
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
66
0
0
16 Nov 2023
Instant3D: Instant Text-to-3D Generation
Instant3D: Instant Text-to-3D Generation
Ming Li
Pan Zhou
Jia-Wei Liu
Jussi Keppo
Min Lin
Shuicheng Yan
Xiangyu Xu
41
30
0
14 Nov 2023
Unlock the Power: Competitive Distillation for Multi-Modal Large
  Language Models
Unlock the Power: Competitive Distillation for Multi-Modal Large Language Models
Xinwei Li
Li Lin
Shuai Wang
Chen Qian
25
3
0
14 Nov 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
  Multi-modal Large Language Models
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Peng Gao
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Hongsheng Li
Yu Qiao
MLLM
VLM
33
212
0
13 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
50
144
0
10 Nov 2023
Bridging the Digital Divide: Performance Variation across Socio-Economic
  Factors in Vision-Language Models
Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models
Joan Nwatu
Oana Ignat
Rada Mihalcea
26
10
0
09 Nov 2023
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot
  Learning
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning
Guangyue Xu
Joyce Chai
Parisa Kordjamshidi
VLM
23
16
0
09 Nov 2023
ConRad: Image Constrained Radiance Fields for 3D Generation from a
  Single Image
ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image
Senthil Purushwalkam
Nikhil Naik
32
5
0
09 Nov 2023
A Data Perspective on Enhanced Identity Preservation for Diffusion
  Personalization
A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization
Xingzhe He
Zhiwen Cao
Nicholas I. Kolkin
Lantao Yu
Kun Wan
Helge Rhodin
Ratheesh Kalarot
45
13
0
07 Nov 2023
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion
  Models
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Shiwei Zhang
Jiayu Wang
Yingya Zhang
Kang Zhao
Hangjie Yuan
Zhanyue Qin
Xiang Wang
Deli Zhao
Jingren Zhou
DiffM
VGen
57
201
0
07 Nov 2023
Exploring Dataset-Scale Indicators of Data Quality
Exploring Dataset-Scale Indicators of Data Quality
Ben Feuer
Chinmay Hegde
29
1
0
07 Nov 2023
Enhancing Multimodal Compositional Reasoning of Visual Language Models
  with Generative Negative Mining
Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining
U. Sahin
Hang Li
Qadeer Ahmad Khan
Daniel Cremers
Volker Tresp
VLM
CoGe
28
12
0
07 Nov 2023
CoVLM: Composing Visual Entities and Relationships in Large Language
  Models Via Communicative Decoding
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Junyan Li
Delin Chen
Yining Hong
Zhenfang Chen
Peihao Chen
Yikang Shen
Chuang Gan
MLLM
33
15
0
06 Nov 2023
LDM3D-VR: Latent Diffusion Model for 3D VR
LDM3D-VR: Latent Diffusion Model for 3D VR
Gabriela Ben-Melech Stan
Diana Wofk
Estelle Aflalo
Shao-Yen Tseng
Z. Cai
Michael Paulitsch
Vasudev Lal
62
6
0
06 Nov 2023
AnyText: Multilingual Visual Text Generation And Editing
AnyText: Multilingual Visual Text Generation And Editing
Yuxiang Tuo
Wangmeng Xiang
Jun-Yan He
Yifeng Geng
Xuansong Xie
DiffM
38
76
0
06 Nov 2023
Scenario Diffusion: Controllable Driving Scenario Generation With
  Diffusion
Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion
Ethan Pronovost
Meghana Reddy Ganesina
Noureldin Hendy
Zeyu Wang
Andres Morales
Kai Wang
Nicholas Roy
35
31
0
05 Nov 2023
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D
  Pre-training
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao
Zeyu Wang
Wei-Shi Zheng
Cihang Xie
Yuyin Zhou
3DPC
34
8
0
03 Nov 2023
Align Your Prompts: Test-Time Prompting with Distribution Alignment for
  Zero-Shot Generalization
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Jameel Hassan
Hanan Gani
Noor Hussein
Muhammad Uzair Khattak
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
VLM
OOD
80
62
0
02 Nov 2023
Integrating Language-Derived Appearance Elements with Visual Cues in
  Pedestrian Detection
Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection
Sungjune Park
Hyunjun Kim
Y. Ro
45
11
0
02 Nov 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
78
22
0
02 Nov 2023
Previous
123...111213...202122
Next