ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02114
  4. Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"

50 / 1,100 papers shown
Title
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Zhiwu Qing
Biao Gong
Yingya Zhang
Yujun Shen
Changxin Gao
Nong Sang
DiffM
VGen
38
26
0
25 Dec 2023
Cycle-Consistency Learning for Captioning and Grounding
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
45
7
0
23 Dec 2023
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language
  Pre-training and Open-Vocabulary Object Detection
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen
Tiancheng Zhao
Mingwei Zhu
Jianwei Yin
VLM
ObjD
99
11
0
22 Dec 2023
Emage: Non-Autoregressive Text-to-Image Generation
Emage: Non-Autoregressive Text-to-Image Generation
Zhangyin Feng
Runyi Hu
Liangxin Liu
Fan Zhang
Duyu Tang
Yong Dai
Xiaocheng Feng
Jiwei Li
Bing Qin
Shuming Shi
DiffM
VLM
28
0
0
22 Dec 2023
UniHuman: A Unified Model for Editing Human Images in the Wild
UniHuman: A Unified Model for Editing Human Images in the Wild
Nannan Li
Qing Liu
Krishna Kumar Singh
Yilin Wang
Jianming Zhang
Bryan A. Plummer
Zhe-nan Lin
23
9
0
22 Dec 2023
Leveraging Habitat Information for Fine-grained Bird Identification
Leveraging Habitat Information for Fine-grained Bird Identification
Tin Nguyen
Peijie Chen
Anh Totti Nguyen
VLM
46
0
0
22 Dec 2023
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain
Jianwei Yang
Humphrey Shi
MLLM
29
24
0
21 Dec 2023
Parrot Captions Teach CLIP to Spot Text
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin
Conghui He
Alex Jinpeng Wang
Bin Wang
Weijia Li
Mike Zheng Shou
41
7
0
21 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
50
29
0
19 Dec 2023
InstructVideo: Instructing Video Diffusion Models with Human Feedback
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan
Shiwei Zhang
Xiang Wang
Yujie Wei
Tao Feng
Yining Pan
Yingya Zhang
Ziwei Liu
Samuel Albanie
Dong Ni
VGen
42
42
0
19 Dec 2023
Decoupled Textual Embeddings for Customized Image Generation
Decoupled Textual Embeddings for Customized Image Generation
Yufei Cai
Yuxiang Wei
Zhilong Ji
Jinfeng Bai
Hu Han
Wangmeng Zuo
DiffM
36
14
0
19 Dec 2023
A Survey of Reasoning with Foundation Models
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
Enze Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
30
79
0
17 Dec 2023
WordScape: a Pipeline to extract multilingual, visually rich Documents
  with Layout Annotations from Web Crawl Data
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
Maurice Weber
Carlo Siebenschuh
Rory Butler
Anton Alexandrov
Valdemar Thanner
...
Haris Jabbar
Ian Foster
Bo-wen Li
Rick L. Stevens
Ce Zhang
21
4
0
15 Dec 2023
Weighted Ensemble Models Are Strong Continual Learners
Weighted Ensemble Models Are Strong Continual Learners
Imad Eddine Marouf
Subhankar Roy
Enzo Tartaglione
Stéphane Lathuilière
CLL
55
17
0
14 Dec 2023
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style
  Models on Dense Captions
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek
Florian Bordes
Pietro Astolfi
Mary Williamson
Vasu Sharma
Adriana Romero Soriano
CLIP
3DV
41
44
0
14 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
38
13
0
13 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data
  Generator
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
40
7
0
11 Dec 2023
Medical Vision Language Pretraining: A survey
Medical Vision Language Pretraining: A survey
Prashant Shrestha
Sanskar Amgain
Bidur Khanal
Cristian A. Linte
Binod Bhattarai
VLM
46
14
0
11 Dec 2023
MAFA: Managing False Negatives for Vision-Language Pre-training
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun
Dohoon Kim
Taesup Moon
VLM
18
4
0
11 Dec 2023
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
Jinrong Yang
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
MLLM
VLM
66
74
0
11 Dec 2023
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains
  Into One
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One
Michael Ranzinger
Greg Heinrich
Jan Kautz
Pavlo Molchanov
VLM
49
42
0
10 Dec 2023
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model
Teng Hu
Jiangning Zhang
Ran Yi
Yuzhen Du
Xu Chen
Liang Liu
Yabiao Wang
Chengjie Wang
84
69
0
10 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates
  Large-Scale Visual Understanding
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
26
16
0
08 Dec 2023
ControlRoom3D: Room Generation using Semantic Proxy Rooms
ControlRoom3D: Room Generation using Semantic Proxy Rooms
Jonas Schult
Sam S. Tsai
Lukas Höllein
Bichen Wu
Jialiang Wang
...
Zijian He
Peizhao Zhang
Bastian Leibe
Peter Vajda
Ji Hou
45
32
0
08 Dec 2023
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained
  Object Insertion and Layout Control
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh
Jianming Zhang
Qing Liu
Cameron Smith
Zhe-nan Lin
Liang Zheng
DiffM
34
11
0
08 Dec 2023
UDiffText: A Unified Framework for High-quality Text Synthesis in
  Arbitrary Images via Character-aware Diffusion Models
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
Yiming Zhao
Zhouhui Lian
79
27
0
08 Dec 2023
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Zhiwu Qing
Shiwei Zhang
Jiayu Wang
Xiang Wang
Yujie Wei
Yingya Zhang
Changxin Gao
Nong Sang
VGen
DiffM
32
37
0
07 Dec 2023
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li
Mingdeng Cao
Xintao Wang
Zhongang Qi
Ming-Ming Cheng
Ying Shan
DiffM
62
190
0
07 Dec 2023
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen
A. Bhunia
Subhadeep Koley
Aneeshan Sain
Pinaki Nath Chowdhury
Yi-Zhe Song
29
8
0
07 Dec 2023
iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image
  Diffusion Model for Interior Design
iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image Diffusion Model for Interior Design
Ruyi Gan
Xiaojun Wu
Junyu Lu
Yuanhe Tian
Di Zhang
...
Renliang Sun
Chang Liu
Jiaxing Zhang
Pingjian Zhang
Yan Song
108
4
0
07 Dec 2023
Understanding (Un)Intended Memorization in Text-to-Image Generative
  Models
Understanding (Un)Intended Memorization in Text-to-Image Generative Models
Ali Naseh
Jaechul Roh
Amir Houmansadr
DiffM
36
6
0
06 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
51
83
0
06 Dec 2023
AVID: Any-Length Video Inpainting with Diffusion Model
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang
Bichen Wu
Xiaoyan Wang
Yaqiao Luo
Luxin Zhang
Yinan Zhao
Peter Vajda
Dimitris N. Metaxas
Licheng Yu
VGen
DiffM
44
33
0
06 Dec 2023
Memory Triggers: Unveiling Memorization in Text-To-Image Generative
  Models through Word-Level Duplication
Memory Triggers: Unveiling Memorization in Text-To-Image Generative Models through Word-Level Duplication
Ali Naseh
Jaechul Roh
Amir Houmansadr
40
6
0
06 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and
  Generation
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Jiaqi Wang
Dahua Lin
Hengshuang Zhao
MLLM
74
36
0
05 Dec 2023
SEVA: Leveraging sketches to evaluate alignment between human and
  machine visual abstraction
SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
Kushin Mukherjee
Holly Huey
Xuanchen Lu
Yael Vinker
Rio Aguina-Kang
Ariel Shamir
Judith E. Fan
35
11
0
05 Dec 2023
Retrieving Conditions from Reference Images for Diffusion Models
Retrieving Conditions from Reference Images for Diffusion Models
Haoran Tang
Xin Zhou
Jieren Deng
Zhihong Pan
Hao Tian
Pratik Chaudhari
42
2
0
05 Dec 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren
Zeyu Wang
Hongru Zhu
Junfei Xiao
Alan Yuille
Cihang Xie
VLM
57
7
0
04 Dec 2023
Object Recognition as Next Token Prediction
Object Recognition as Next Token Prediction
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
40
9
0
04 Dec 2023
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Zhuoran Yu
Chenchen Zhu
Sean Culatana
Raghuraman Krishnamoorthi
Fanyi Xiao
Yong Jae Lee
117
15
0
04 Dec 2023
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object
  Detection
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection
Sunghun Kang
Junbum Cha
Jonghwan Mun
Byungseok Roh
Chang D. Yoo
VLM
ObjD
53
1
0
04 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao
Zhan Tong
K. Lin
Joya Chen
Mike Zheng Shou
41
0
0
04 Dec 2023
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
Jiarui Xu
Yossi Gandelsman
Amir Bar
Jianwei Yang
Jianfeng Gao
Trevor Darrell
Xiaolong Wang
VLM
28
3
0
04 Dec 2023
CLAMP: Contrastive LAnguage Model Prompt-tuning
CLAMP: Contrastive LAnguage Model Prompt-tuning
Piotr Teterwak
Ximeng Sun
Bryan A. Plummer
Kate Saenko
Ser-Nam Lim
MLLM
VLM
40
1
0
04 Dec 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
MLLM
VLM
32
158
0
01 Dec 2023
Text-Guided 3D Face Synthesis -- From Generation to Editing
Text-Guided 3D Face Synthesis -- From Generation to Editing
Yunjie Wu
Yapeng Meng
Zhipeng Hu
Lincheng Li
Haoqian Wu
Kun Zhou
Weiwei Xu
Xin Yu
DiffM
58
9
0
01 Dec 2023
Raising the Bar of AI-generated Image Detection with CLIP
Raising the Bar of AI-generated Image Detection with CLIP
D. Cozzolino
Giovanni Poggi
Riccardo Corvi
Matthias Nießner
L. Verdoliva
VLM
35
75
0
30 Nov 2023
ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with
  Diffusion Models
ART⋅\boldsymbol{\cdot}⋅V: Auto-Regressive Text-to-Video Generation with Diffusion Models
Wenming Weng
Ruoyu Feng
Yanhui Wang
Qi Dai
Chunyu Wang
...
Jianmin Bao
Yuhui Yuan
Chong Luo
Yueyi Zhang
Zhiwei Xiong
VGen
33
32
0
30 Nov 2023
BioCLIP: A Vision Foundation Model for the Tree of Life
BioCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens
Jiaman Wu
Matthew J Thompson
Elizabeth G Campolongo
Chan Hee Song
...
Wasila M Dahdul
Charles V. Stewart
Tanya Berger-Wolf
Wei-Lun Chao
Yu-Chuan Su
49
64
0
30 Nov 2023
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRM
MLLM
AuLLM
56
45
0
30 Nov 2023
Previous
123...101112...202122
Next