ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02114
  4. Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
    VLM
    MLLM
    CLIP
ArXivPDFHTML

Papers citing "LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"

50 / 1,097 papers shown
Title
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Chen Tang
Xinzhu Ma
Encheng Su
Xiufeng Song
Xiaohong Liu
Wei-Hong Li
Lei Bai
Wanli Ouyang
Xiangyu Yue
3DGS
AI4TS
72
0
0
26 Mar 2025
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
Mehdi Moshtaghi
Siavash H. Khajavi
Joni Pajarinen
VLM
54
0
0
25 Mar 2025
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Yufei Cai
Hu Han
Yuxiang Wei
Shiguang Shan
Xilin Chen
DiffM
VGen
65
0
0
25 Mar 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
42
5
0
23 Mar 2025
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
Mengtian Li
Jinshu Chen
Wanquan Feng
Bingchuan Li
Fei Dai
Mingcong Liu
Qian He
3DH
52
0
0
21 Mar 2025
DermDiff: Generative Diffusion Model for Mitigating Racial Biases in Dermatology Diagnosis
DermDiff: Generative Diffusion Model for Mitigating Racial Biases in Dermatology Diagnosis
Nusrat Munia
Abdullah-Al-Zubaer Imran
MedIm
47
1
0
21 Mar 2025
AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process
AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process
J. Hu
Shuyong Gao
Qianyu Guo
Yan Wang
Qishan Wang
Yuang Feng
Wenqiang Zhang
DiffM
VGen
47
0
0
21 Mar 2025
Visual Persona: Foundation Model for Full-Body Human Customization
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam
Soowon Son
Zhan Xu
Jing Shi
Difan Liu
Feng Liu
Aashish Misraa
Seungryong Kim
Yang Zhou
DiffM
51
0
0
19 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Fahad Shahbaz Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
59
0
0
18 Mar 2025
AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations
AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations
Quang-Trung Truong
Wong Yuk Kwan
Duc Thanh Nguyen
Binh-Son Hua
Sai-Kit Yeung
VGen
53
0
0
17 Mar 2025
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
Forouzan Fallah
Maitreya Patel
Agneet Chatterjee
Vlad I. Morariu
Chitta Baral
Yezhou Yang
CoGe
63
0
0
17 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
80
0
0
17 Mar 2025
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
ChangHee Yang
H. Song
Seokhun Choi
Seungwoo Lee
Jaechul Kim
Hoseok Do
48
0
0
17 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi
Tejaswi Kasarla
Pascal Mettes
Lorenzo Baraldi
Rita Cucchiara
VLM
MU
61
0
0
15 Mar 2025
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren
Wentao Ma
Huan Yang
Cong Wei
Ge Zhang
Wenhu Chen
Mamba
59
3
0
14 Mar 2025
PARIC: Probabilistic Attention Regularization for Language Guided Image Classification from Pre-trained Vison Language Models
Mayank Nautiyal
Stela Arranz Gheorghe
Kristiana Stefa
Li Ju
Ida-Maria Sintorn
Prashant Singh
VLM
56
0
0
14 Mar 2025
DreamInsert: Zero-Shot Image-to-Video Object Insertion from A Single Image
Qi Zhao
Zhan Ma
Pan Zhou
VGen
75
0
0
13 Mar 2025
A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
ObjD
VLM
48
0
0
13 Mar 2025
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang
June Suk Choi
Jaehyeong Jo
Kimin Lee
Sung Ju Hwang
DiffM
WIGM
84
1
0
12 Mar 2025
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Mikey Shechter
Yair Carmon
CLIP
47
0
0
11 Mar 2025
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang
Hongyuan Zhang
Yuan Yuan
AAML
PICV
80
0
0
11 Mar 2025
Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation
Mingkang Zhu
Xi Chen
Zihan Wang
Bei Yu
Hengshuang Zhao
Jiaya Jia
MoMe
55
0
0
11 Mar 2025
Controlling Latent Diffusion Using Latent CLIP
Jason Becker
Chris Wendler
Peter Baylies
Robert West
Christian Wressnegger
DiffM
VLM
68
0
0
11 Mar 2025
ComicsPAP: understanding comic strips by picking the correct panel
ComicsPAP: understanding comic strips by picking the correct panel
Emanuele Vivoli
Artemis LLabres
Mohamed Ali Soubgui
Marco Bertini
Ernest Valveny Llobet
Dimosthenis Karatzas
65
0
0
11 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
46
0
0
10 Mar 2025
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
Ouxiang Li
Yuan Wang
Xinting Hu
Houcheng Jiang
Tao Liang
Y. Hao
Guojun Ma
Fuli Feng
DiffM
49
1
0
10 Mar 2025
Just Functioning as a Hook for Two-Stage Referring Multi-Object Tracking
Just Functioning as a Hook for Two-Stage Referring Multi-Object Tracking
Weize Li
Yunhao Du
Qixiang Yin
Zhaohui Hou
Zhicheng Zhao
Daqi Liu
64
0
0
10 Mar 2025
Consistent Image Layout Editing with Diffusion Models
Tao Xia
Yudi Zhang
Ting Liu Lei Zhang
DiffM
66
1
0
09 Mar 2025
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Xirui Hu
Jiahao Wang
Hao Chen
Weizhan Zhang
Benqi Wang
Yangfu Li
Haishun Nan
DiffM
67
0
0
09 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Xin Jin
Hui Su
Jinlan Fu
Xiaoyu Shen
68
0
0
08 Mar 2025
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion
Ziyi Yang
Fanqi Wan
Longguang Zhong
Canbin Huang
Guosheng Liang
Xiaojun Quan
MoMe
95
0
0
06 Mar 2025
Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024
Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024
Nuria Alina Chandra
Ryan Murtfeldt
Lin Qiu
Arnab Karmakar
Hannah Lee
...
Sejin Paik
Changyeon Lee
Jongwook Choi
Aerin Kim
O. Etzioni
64
5
0
04 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
45
1
0
04 Mar 2025
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu
Zhikai Li
Qingyi Gu
DiffM
40
0
0
03 Mar 2025
Language-Assisted Feature Transformation for Anomaly Detection
EungGu Yun
Heonjin Ha
Yeongwoo Nam
Bryan Dongik Lee
68
0
0
03 Mar 2025
SolidMark: Evaluating Image Memorization in Generative Models
Nicky Kriplani
Minh Pham
Gowthami Somepalli
Chinmay Hegde
Niv Cohen
VLM
45
1
0
01 Mar 2025
Analyzing CLIP's Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study
Analyzing CLIP's Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study
Reza Abbasi
Ali Nazari
Aminreza Sefid
Mohammadali Banayeeanzade
M. Rohban
M. Baghshah
VLM
64
1
0
27 Feb 2025
Vision-Encoders (Already) Know What They See: Mitigating Object Hallucination via Simple Fine-Grained CLIPScore
Vision-Encoders (Already) Know What They See: Mitigating Object Hallucination via Simple Fine-Grained CLIPScore
Hongseok Oh
Wonseok Hwang
VLM
41
0
0
27 Feb 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders
Interpreting CLIP with Hierarchical Sparse Autoencoders
Vladimir Zaigrajew
Hubert Baniecki
P. Biecek
56
0
0
27 Feb 2025
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi
Ali Nazari
Aminreza Sefid
Mohammadali Banayeeanzade
M. Rohban
M. Baghshah
VLM
89
1
0
27 Feb 2025
SCA3D: Enhancing Cross-modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation
SCA3D: Enhancing Cross-modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation
Junlong Ren
Hao Wu
Hui Xiong
Haoran Wang
68
0
0
26 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
126
7
0
25 Feb 2025
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
Zhenghao Liu
Xingsheng Zhu
Tianshuo Zhou
Xinyi Zhang
Xiaoyuan Yi
Yukun Yan
Yu Gu
Ge Yu
Maosong Sun
RALM
VLM
43
1
0
24 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
122
9
0
18 Feb 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu
Yuvan Sharma
Haoru Xue
Giscard Biamby
Junyi Zhang
Ziteng Ji
Trevor Darrell
Roei Herzig
78
1
0
18 Feb 2025
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Susumu Takeuchi
47
0
0
18 Feb 2025
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
Zikang Liu
K. Zhou
Wayne Xin Zhao
Dawei Gao
Yaliang Li
Zhicheng Dou
MLLM
VLM
LRM
94
0
0
17 Feb 2025
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Zahra Bayramli
Ayhan Suleymanzade
Na Min An
Huzama Ahmad
Eunsu Kim
Junyeong Park
James Thorne
Alice H. Oh
91
0
0
13 Feb 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begüm Demir
Ioannis Papoutsis
VLM
86
0
0
13 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
105
4
0
12 Feb 2025
Previous
12345...202122
Next