ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02114
  4. Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"

50 / 1,097 papers shown
Title
E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation
  with character awareness
E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awareness
Robin Courant
Nicolas Dufour
Xi Wang
Marc Christie
Vicky Kalogeiton
VGen
46
4
0
01 Jul 2024
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training
  with Limited Resources
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Xiyuan Wei
Fanjiang Ye
Ori Yonay
Xingyu Chen
Baixi Sun
Dingwen Tao
Tianbao Yang
VLM
CLIP
59
2
0
01 Jul 2024
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image
  Generation
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu
Yuhang Ma
Yang Zhen
Jun Dan
Yunlong Yu
Zeng Zhao
Zhipeng Hu
Bai Liu
Changjie Fan
VLM
DiffM
68
13
0
30 Jun 2024
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework
  for Multimodal LLMs
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Sukmin Yun
Haokun Lin
Rusiru Thushara
Mohammad Qazim Bhat
Yongxin Wang
...
Timothy Baldwin
Zhengzhong Liu
Eric P. Xing
Xiaodan Liang
Zhiqiang Shen
54
10
0
28 Jun 2024
SK-VQA: Synthetic Knowledge Generation at Scale for Training
  Context-Augmented Multimodal LLMs
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
Xin Su
Man Luo
Kris W Pan
Tien Pei Chou
Vasudev Lal
Phillip Howard
53
3
0
28 Jun 2024
Fairness and Bias in Multimodal AI: A Survey
Fairness and Bias in Multimodal AI: A Survey
Tosin P. Adewumi
Lama Alkhaled
Namrata Gurung
G. V. Boven
Irene Pagliai
58
9
0
27 Jun 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of
  Text-to-Time-lapse Video Generation
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVM
VGen
77
34
0
26 Jun 2024
DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis
  through Structure Guidance
DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance
Younghyun Kim
Geunmin Hwang
Junyu Zhang
Eunbyung Park
60
6
0
26 Jun 2024
Diffusion Model-Based Video Editing: A Survey
Diffusion Model-Based Video Editing: A Survey
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Dacheng Tao
VGen
66
22
0
26 Jun 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
75
31
0
24 Jun 2024
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
Thomas Stegmüller
Tim Lebailly
Nikola Dukic
Behzad Bozorgtabar
Tinne Tuytelaars
Jean-Philippe Thiran
VLM
39
1
0
23 Jun 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Yuxuan Qiao
Haodong Duan
Xinyu Fang
Junming Yang
Lin Chen
Songyang Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LRM
45
19
0
20 Jun 2024
AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation
AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation
Xinyu Hou
Xiaoming Li
Chen Change Loy
DiffM
46
0
0
18 Jun 2024
Efficient and Long-Tailed Generalization for Pre-trained Vision-Language
  Model
Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model
Jiang-Xin Shi
Chi Zhang
Tong Wei
Yu-Feng Li
VLM
43
2
0
18 Jun 2024
Improving Text-To-Audio Models with Synthetic Captions
Improving Text-To-Audio Models with Synthetic Captions
Zhifeng Kong
Sang-gil Lee
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Rafael Valle
Soujanya Poria
Bryan Catanzaro
53
11
0
18 Jun 2024
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate
  Associative Bias
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
Salma Abdel Magid
Jui-Hsien Wang
Kushal Kafle
Hanspeter Pfister
44
1
0
17 Jun 2024
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal
  Dataset with One Trillion Tokens
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
Anas Awadalla
Le Xue
Oscar Lo
Manli Shu
Hannah Lee
...
Silvio Savarese
Caiming Xiong
Ran Xu
Yejin Choi
Ludwig Schmidt
69
25
0
17 Jun 2024
Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning
Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning
Tian Liu
Huixin Zhang
Shubham Parashar
Shu Kong
29
2
0
17 Jun 2024
From Pixels to Prose: A Large Dataset of Dense Image Captions
From Pixels to Prose: A Large Dataset of Dense Image Captions
Vasu Singla
Kaiyu Yue
Sukriti Paul
Reza Shirkavand
Mayuka Jayawardhana
Alireza Ganjdanesh
Heng Huang
A. Bhatele
Gowthami Somepalli
Tom Goldstein
3DV
VLM
36
22
0
14 Jun 2024
Yo'LLaVA: Your Personalized Language and Vision Assistant
Yo'LLaVA: Your Personalized Language and Vision Assistant
Thao Nguyen
Haotian Liu
Yuheng Li
Mu Cai
Utkarsh Ojha
Yong Jae Lee
VLM
MLLM
64
15
0
13 Jun 2024
Real-Time Deepfake Detection in the Real-World
Real-Time Deepfake Detection in the Real-World
Bar Cavia
Eliahu Horwitz
Tal Reiss
Yedid Hoshen
51
6
0
13 Jun 2024
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim
Karl Pertsch
Siddharth Karamcheti
Ted Xiao
Ashwin Balakrishna
...
Russ Tedrake
Dorsa Sadigh
Sergey Levine
Percy Liang
Chelsea Finn
LM&Ro
VLM
51
367
0
13 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
39
3
0
13 Jun 2024
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
Qijun Gan
Song Wang
Shengtao Wu
Jianke Zhu
60
1
0
13 Jun 2024
LLM-assisted Concept Discovery: Automatically Identifying and Explaining
  Neuron Functions
LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions
N. Hoang-Xuan
Minh Nhat Vu
My T. Thai
28
3
0
12 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
44
35
0
12 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
  Interleaved with Text
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLM
OffRL
56
21
0
12 Jun 2024
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities
  in Large Vision-Language Models
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Shimin Chen
Yitian Yuan
Shaoxiang Chen
Zequn Jie
Lin Ma
VLM
35
3
0
12 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent
  Compression Learning
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLM
CLIP
47
5
0
11 Jun 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal
  Large Language Models
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
50
13
0
11 Jun 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
RWKV-CLIP: A Robust Vision-Language Representation Learner
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
VLM
CLIP
40
14
0
11 Jun 2024
A Taxonomy of Challenges to Curating Fair Datasets
A Taxonomy of Challenges to Curating Fair Datasets
Dora Zhao
M. Scheuerman
Pooja Chitre
Jerone T. A. Andrews
Georgia Panagiotidou
Shawn Walker
Kathleen H. Pine
Alice Xiang
47
2
0
10 Jun 2024
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Zijian Chen
Wei Sun
Yuan Tian
Jun Jia
Zicheng Zhang
Jiarui Wang
Ru Huang
Xiongkuo Min
Guangtao Zhai
Wenjun Zhang
EGVM
56
11
0
10 Jun 2024
Vript: A Video Is Worth Thousands of Words
Vript: A Video Is Worth Thousands of Words
Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao
VGen
80
22
0
10 Jun 2024
Faster Than Lies: Real-time Deepfake Detection using Binary Neural
  Networks
Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks
Lanzino Romeo
Fontana Federico
Diko Anxhelo
Marini Marco Raoul
Cinque Luigi
46
17
0
07 Jun 2024
Nomic Embed Vision: Expanding the Latent Space
Nomic Embed Vision: Expanding the Latent Space
Zach Nussbaum
Brandon Duderstadt
Andriy Mulyar
VLM
33
5
0
06 Jun 2024
Learning 1D Causal Visual Representation with De-focus Attention
  Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Chenxin Tao
Xizhou Zhu
Shiqian Su
Lewei Lu
Changyao Tian
...
Gao Huang
Hongsheng Li
Ping Luo
Jie Zhou
Jifeng Dai
70
1
0
06 Jun 2024
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise
  Optimization
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
L. Eyring
Shyamgopal Karthik
Karsten Roth
Alexey Dosovitskiy
Zeynep Akata
88
17
0
06 Jun 2024
CountCLIP -- [Re] Teaching CLIP to Count to Ten
CountCLIP -- [Re] Teaching CLIP to Count to Ten
Harshvardhan Mestha
Tejas Agrawal
Karan Bania
Shreyas V
Yash Bhisikar
VLM
24
1
0
05 Jun 2024
Inv-Adapter: ID Customization Generation via Image Inversion and
  Lightweight Adapter
Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter
Peng-Fei Xing
Ning Wang
Jianbo Ouyang
Zechao Li
DiffM
44
1
0
05 Jun 2024
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal
  Learning
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Alex Jinpeng Wang
Linjie Li
Yiqi Lin
Min Li
Lijuan Wang
Mike Zheng Shou
VLM
33
3
0
04 Jun 2024
Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion
Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion
Colin Hansen
Simas Glinskis
Ashwin Raju
Micha Kornreich
JinHyeong Park
Jayashri Pawar
Richard Herzog
Li Zhang
Benjamin Odry
MedIm
DiffM
59
3
0
04 Jun 2024
RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
Qi Wang
Ruijie Lu
Xudong Xu
Jingbo Wang
Michael Yu Wang
Bo Dai
Gang Zeng
Dan Xu
DiffM
40
6
0
04 Jun 2024
SLANT: Spurious Logo ANalysis Toolkit
SLANT: Spurious Logo ANalysis Toolkit
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
43
0
0
03 Jun 2024
ED-SAM: An Efficient Diffusion Sampling Approach to Domain
  Generalization in Vision-Language Foundation Models
ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
Thanh-Dat Truong
Xin Li
Bhiksha Raj
Jackson Cothren
Khoa Luu
DiffM
VLM
54
1
0
03 Jun 2024
fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity
  perception in image embeddings
fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddings
Tillmann Ohm
Andres Karjus
Mikhail Tamm
Maximilian Schich
41
1
0
03 Jun 2024
Dimba: Transformer-Mamba Diffusion Models
Dimba: Transformer-Mamba Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Youqiang Zhang
Junshi Huang
Mamba
62
17
0
03 Jun 2024
AlignSAM: Aligning Segment Anything Model to Open Context via
  Reinforcement Learning
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Duojun Huang
Xinyu Xiong
Jie Ma
Jichang Li
Zequn Jie
Lin Ma
Guanbin Li
VLM
54
11
0
01 Jun 2024
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for
  Transferable Insights
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
38
3
0
31 May 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in
  Multimodal Large Language Models
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
35
29
0
31 May 2024
Previous
123...567...202122
Next