ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.08209
  4. Cited By
Who's in and who's out? A case study of multimodal CLIP-filtering in
  DataComp

Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp

13 May 2024
Rachel Hong
William Agnew
Tadayoshi Kohno
Jamie Morgenstern
ArXivPDFHTML

Papers citing "Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp"

17 / 17 papers shown
Title
Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization
Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization
Ruichuan An
Kai Zeng
Ming Lu
Sihan Yang
Renrui Zhang
Huitong Ji
Qizhe Zhang
Yihao Luo
Hao Liang
Wentao Zhang
80
0
0
17 Mar 2025
Scaling Laws for Data Filtering -- Data Curation cannot be Compute
  Agnostic
Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic
Sachin Goyal
Pratyush Maini
Zachary Chase Lipton
Aditi Raghunathan
J. Zico Kolter
78
43
0
10 Apr 2024
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of
  English Pretraining Data Filters
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
L. Lucy
Suchin Gururangan
Luca Soldaini
Emma Strubell
David Bamman
Lauren Klein
Jesse Dodge
69
16
0
12 Jan 2024
Ethical Considerations for Responsible Data Curation
Ethical Considerations for Responsible Data Curation
Jerone T. A. Andrews
Dora Zhao
William Thong
Apostolos Modas
Orestis Papakyriakopoulos
Alice Xiang
62
20
0
07 Feb 2023
Contrastive Language-Vision AI Models Pretrained on Web-Scraped
  Multimodal Data Exhibit Sexual Objectification Bias
Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
Robert Wolfe
Yiwei Yang
Billy Howe
Aylin Caliskan
DiffM
64
53
0
21 Dec 2022
Easily Accessible Text-to-Image Generation Amplifies Demographic
  Stereotypes at Large Scale
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
Federico Bianchi
Pratyusha Kalluri
Esin Durmus
Faisal Ladhak
Myra Cheng
Debora Nozza
Tatsunori Hashimoto
Dan Jurafsky
James Zou
Aylin Caliskan
DiffM
VLM
60
298
0
07 Nov 2022
Quality Not Quantity: On the Interaction between Dataset Design and
  Robustness of CLIP
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
83
100
0
10 Aug 2022
Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency,
  Syntax, and Semantics
Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics
Aylin Caliskan
Pimparkar Parth Ajay
Tessa E. S. Charlesworth
Robert Wolfe
M. Banaji
CVBM
FaML
64
50
0
07 Jun 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural
  Language Guidance
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
96
375
0
18 Apr 2022
Whose Language Counts as High Quality? Measuring Language Ideologies in
  Text Data Selection
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
Suchin Gururangan
Dallas Card
Sarah K. Drier
E. K. Gade
Leroy Z. Wang
Zeyu Wang
Luke Zettlemoyer
Noah A. Smith
198
77
0
25 Jan 2022
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
180
1,398
0
03 Nov 2021
Do Datasets Have Politics? Disciplinary Values in Computer Vision
  Dataset Development
Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
M. Scheuerman
Emily L. Denton
A. Hanna
52
206
0
09 Aug 2021
The Values Encoded in Machine Learning Research
The Values Encoded in Machine Learning Research
Abeba Birhane
Pratyusha Kalluri
Dallas Card
William Agnew
Ravit Dotan
Michelle Bao
58
277
0
29 Jun 2021
What's in the Box? A Preliminary Analysis of Undesirable Content in the
  Common Crawl Corpus
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus
A. Luccioni
J. Viviano
49
116
0
06 May 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
65
437
0
18 Apr 2021
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
Su Lin Blodgett
Solon Barocas
Hal Daumé
Hanna M. Wallach
92
1,211
0
28 May 2020
Contrastive Multiview Coding
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
131
2,385
0
13 Jun 2019
1