ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02114
  4. Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"

50 / 1,098 papers shown
Title
DeCo: Decoupling Token Compression from Semantic Abstraction in
  Multimodal Large Language Models
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
35
29
0
31 May 2024
GenMix: Combining Generative and Mixture Data Augmentation for Medical
  Image Classification
GenMix: Combining Generative and Mixture Data Augmentation for Medical Image Classification
Han S. Lee
Haeil Lee
Helen Hong
MedIm
24
1
0
31 May 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
  Reference Models
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
43
24
0
30 May 2024
Slight Corruption in Pre-training Data Makes Better Diffusion Models
Slight Corruption in Pre-training Data Makes Better Diffusion Models
Hao Chen
Yujin Han
Diganta Misra
Xiang Li
Kai Hu
Difan Zou
Masashi Sugiyama
Jindong Wang
Bhiksha Raj
DiffM
47
5
0
30 May 2024
CoSy: Evaluating Textual Explanations of Neurons
CoSy: Evaluating Textual Explanations of Neurons
Laura Kopf
P. Bommer
Anna Hedström
Sebastian Lapuschkin
Marina M.-C. Höhne
Kirill Bykov
44
7
0
30 May 2024
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Andreas Koukounas
Georgios Mastrapas
Michael Gunther
Bo Wang
Scott Martens
...
Saahil Ognawala
Susana Guzman
Maximilian Werk
Nan Wang
Han Xiao
VLM
27
16
0
30 May 2024
Evaluating Vision-Language Models on Bistable Images
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
49
0
0
29 May 2024
Multi-Modal Generative Embedding Model
Multi-Modal Generative Embedding Model
Feipeng Ma
Hongwei Xue
Guangting Wang
Yizhou Zhou
Fengyun Rao
Shilin Yan
Yueyi Zhang
Siying Wu
Mike Zheng Shou
Xiaoyan Sun
VLM
39
3
0
29 May 2024
Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot
  Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language
  Models
Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models
Tianrun Chen
Chunan Yu
Jing Li
Jianqi Zhang
Lanyun Zhu
Deyi Ji
Yong Zhang
Ying Zang
Zejian Li
Lingyun Sun
LRM
53
9
0
29 May 2024
Benchmarking and Improving Detail Image Caption
Benchmarking and Improving Detail Image Caption
Hongyuan Dong
Jiawen Li
Bohong Wu
Jiacong Wang
Yuan Zhang
Haoyuan Guo
VLM
MLLM
35
16
0
29 May 2024
Topological Perspectives on Optimal Multimodal Embedding Spaces
Topological Perspectives on Optimal Multimodal Embedding Spaces
Abdul Aziz
Abdul Rahim
BDL
45
0
0
29 May 2024
Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
Mengyi Shan
Lu Dong
Yutao Han
Yuanyuan Yao
Tao Liu
Ifeoma Nwogu
Guo-Jun Qi
Mitch Hill
VGen
DiffM
38
9
0
28 May 2024
Why are Visually-Grounded Language Models Bad at Image Classification?
Why are Visually-Grounded Language Models Bad at Image Classification?
Yuhui Zhang
Alyssa Unell
Xiaohan Wang
Dhruba Ghosh
Yuchang Su
Ludwig Schmidt
Serena Yeung-Levy
VLM
35
27
0
28 May 2024
ToonCrafter: Generative Cartoon Interpolation
ToonCrafter: Generative Cartoon Interpolation
Jinbo Xing
Hanyuan Liu
Menghan Xia
Yong Zhang
Xintao Wang
Ying Shan
Tien-Tsin Wong
55
28
0
28 May 2024
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and
  Open-World Unknown Objects Supervision
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
Junjie Wang
Bin Chen
Bin Kang
Yulin Li
Yichi Chen
Weizhi Xian
Huifeng Chang
VLM
ObjD
36
7
0
28 May 2024
The SkatingVerse Workshop & Challenge: Methods and Results
The SkatingVerse Workshop & Challenge: Methods and Results
Jian Zhao
Lei Jin
Jianshu Li
Zheng Zhu
Yinglei Teng
...
Shiníchi Satoh
Yandong Guo
Cewu Lu
Junliang Xing
Jane Shengmei Shen
AI4TS
38
0
0
27 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
58
36
0
26 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for
  Multimodal Large Language Models
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
53
3
0
24 May 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision
  Models
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Byung-Kwan Lee
Chae Won Kim
Beomchan Park
Yonghyun Ro
MLLM
LRM
41
18
0
24 May 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLM
DiffM
75
15
0
24 May 2024
Focus Anywhere for Fine-grained Multi-page Document Understanding
Focus Anywhere for Fine-grained Multi-page Document Understanding
Chenglong Liu
Haoran Wei
Jinyue Chen
Lingyu Kong
Zheng Ge
Zining Zhu
Liang Zhao
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
46
21
0
23 May 2024
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment
  Capability
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Fei Zhao
Taotian Pang
Chunhui Li
Zhen Wu
Junjie Guo
Shangyu Xing
Xinyu Dai
55
7
0
23 May 2024
Personalized Residuals for Concept-Driven Text-to-Image Generation
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham
Matthew Fisher
James Hays
Nicholas I. Kolkin
Yuchen Liu
Richard Y. Zhang
Tobias Hinz
DiffM
50
7
0
21 May 2024
Customize Your Own Paired Data via Few-shot Way
Customize Your Own Paired Data via Few-shot Way
Jinshu Chen
Bingchuan Li
Miao Hua
Panpan Xu
Qian He
DiffM
42
0
0
21 May 2024
Enhancing Understanding Through Wildlife Re-Identification
Enhancing Understanding Through Wildlife Re-Identification
J. Buitenhuis
45
0
0
17 May 2024
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
47
45
0
17 May 2024
FFF: Fixing Flawed Foundations in contrastive pre-training results in
  very strong Vision-Language models
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
Adrian Bulat
Yassine Ouali
Georgios Tzimiropoulos
VLM
47
4
0
16 May 2024
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Wanting Xu
Yang Liu
Langping He
Xucheng Huang
Ling Jiang
VLM
MLLM
43
2
0
15 May 2024
Who's in and who's out? A case study of multimodal CLIP-filtering in
  DataComp
Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp
Rachel Hong
William Agnew
Tadayoshi Kohno
Jamie Morgenstern
27
9
0
13 May 2024
Non-confusing Generation of Customized Concepts in Diffusion Models
Non-confusing Generation of Customized Concepts in Diffusion Models
Wang Lin
Jingyuan Chen
Jiaxin Shi
Yichen Zhu
Chen Liang
...
Tao Jin
Zhou Zhao
Fei Wu
Shuicheng Yan
Hanwang Zhang
DiffM
48
12
0
11 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and
  Duration via Flow-based Large Diffusion Transformers
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
37
84
0
09 May 2024
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and
  AI-Generated Images
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Y. Qu
Xinyue Shen
Yixin Wu
Michael Backes
Savvas Zannettou
Yang Zhang
EGVM
40
12
0
06 May 2024
Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval
Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval
Jiacheng Cheng
Hijung Valentina Shin
Nuno Vasconcelos
Bryan C. Russell
Fabian Caba Heilbron
VLM
31
1
0
06 May 2024
What matters when building vision-language models?
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
43
157
0
03 May 2024
Customizing Text-to-Image Models with a Single Image Pair
Customizing Text-to-Image Models with a Single Image Pair
Maxwell Jones
Sheng-Yu Wang
Nupur Kumari
David Bau
Jun-Yan Zhu
DiffM
25
19
0
02 May 2024
Guided Conditional Diffusion Classifier (ConDiff) for Enhanced
  Prediction of Infection in Diabetic Foot Ulcers
Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers
Palawat Busaranuvong
Emmanuel O. Agu
Deepak Kumar
Shefalika Gautam
Reza Saadati Fard
B. Tulu
Diane Strong
MedIm
31
0
0
01 May 2024
At the edge of a generative cultural precipice
At the edge of a generative cultural precipice
Diego Porres
Alex Gomez-Villa
34
0
0
30 Apr 2024
Synthetic Image Verification in the Era of Generative AI: What Works and
  What Isn't There Yet
Synthetic Image Verification in the Era of Generative AI: What Works and What Isn't There Yet
D. Tariang
Riccardo Corvi
D. Cozzolino
Giovanni Poggi
Koki Nagano
L. Verdoliva
53
8
0
30 Apr 2024
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and
  Texts
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim
Sanghyuk Chun
Taekyung Kim
Dongyoon Han
Sangdoo Yun
47
7
0
26 Apr 2024
Learning text-to-video retrieval from image captioning
Learning text-to-video retrieval from image captioning
Lucas Ventura
Cordelia Schmid
Gül Varol
3DV
44
3
0
26 Apr 2024
Zero-Shot Distillation for Image Encoders: How to Make Effective Use of
  Synthetic Data
Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data
Niclas Popp
J. H. Metzen
Matthias Hein
VLM
42
1
0
25 Apr 2024
Interactive3D: Create What You Want by Interactive 3D Generation
Interactive3D: Create What You Want by Interactive 3D Generation
Shaocong Dong
Lihe Ding
Zhanpeng Huang
Zibin Wang
Tianfan Xue
Dan Xu
35
10
0
25 Apr 2024
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion
  Models
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni
Bernhard Egger
Suhas Lohit
A. Cherian
Ye Wang
T. Koike-Akino
S. X. Huang
Tim K. Marks
DiffM
45
12
0
25 Apr 2024
An Analysis of Recent Advances in Deepfake Image Detection in an
  Evolving Threat Landscape
An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape
Sifat Muhammad Abdullah
Aravind Cheruvu
Shravya Kanchi
Taejoong Chung
Peng Gao
Murtuza Jadliwala
Bimal Viswanath
AAML
29
11
0
24 Apr 2024
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities
  in Semantic Dataset Deduplication
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman
Stefan Lee
Scott D. Cohen
Kushal Kafle
VLM
41
5
0
24 Apr 2024
MoDE: CLIP Data Experts via Clustering
MoDE: CLIP Data Experts via Clustering
Jiawei Ma
Po-Yao Huang
Saining Xie
Shang-Wen Li
Luke Zettlemoyer
Shih-Fu Chang
Wen-tau Yih
Hu Xu
MoE
CLIP
VLM
31
11
0
24 Apr 2024
SPARO: Selective Attention for Robust and Compositional Transformer
  Encodings for Vision
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani
Bac Nguyen
Samuel Lavoie
Ranjay Krishna
Aaron Courville
39
1
0
24 Apr 2024
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion
  Models
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models
Qinghe Wang
Baolu Li
Xiaomin Li
Bing Cao
Liqian Ma
Huchuan Lu
Xu Jia
DiffM
42
6
0
24 Apr 2024
1st Place Solution to the 1st SkatingVerse Challenge
1st Place Solution to the 1st SkatingVerse Challenge
Tao Sun
Yuanzi Fu
Kaicheng Yang
Jian Wu
Ziyong Feng
VGen
16
0
0
22 Apr 2024
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Zhuofan Zong
Bingqi Ma
Dazhong Shen
Guanglu Song
Hao Shao
Dongzhi Jiang
Hongsheng Li
Yu Liu
MoE
45
41
0
19 Apr 2024
Previous
123...678...202122
Next