ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11431
  4. Cited By
RedCaps: web-curated image-text data created by the people, for the
  people

RedCaps: web-curated image-text data created by the people, for the people

22 November 2021
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
ArXivPDFHTML

Papers citing "RedCaps: web-curated image-text data created by the people, for the people"

50 / 130 papers shown
Title
Position: Restructuring of Categories and Implementation of Guidelines Essential for VLM Adoption in Healthcare
Position: Restructuring of Categories and Implementation of Guidelines Essential for VLM Adoption in Healthcare
Amara Tariq
Rimita Lahiri
Charles Kahn
Imon Banerjee
31
0
0
12 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Impact of Language Guidance: A Reproducibility Study
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
47
0
0
10 Apr 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Feiyu Xiong
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
177
2
0
27 Mar 2025
Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU
Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU
Àlex Pujol Vidal
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
MU
64
0
0
19 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi
Tejaswi Kasarla
Pascal Mettes
Lorenzo Baraldi
Rita Cucchiara
VLM
MU
61
0
0
15 Mar 2025
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Mikey Shechter
Yair Carmon
CLIP
47
0
0
11 Mar 2025
Should VLMs be Pre-trained with Image Data?
Sedrick Scott Keh
Jean-Pierre Mercat
S. Gadre
Kushal Arora
Igor Vasiljevic
...
Shuran Song
Russ Tedrake
Thomas Kollar
Ludwig Schmidt
Achal Dave
VLM
49
0
0
10 Mar 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begüm Demir
Ioannis Papoutsis
VLM
86
0
0
13 Feb 2025
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Alejandro Lozano
Min Woo Sun
James Burgess
Liangyu Chen
Jeffrey Nirschl
...
Xiaohan Wang
Yuhui Zhang
Alfred Seunghoon Song
Robert Tibshirani
Serena Yeung-Levy
LM&MA
VLM
MedIm
70
8
0
13 Jan 2025
Florence-VL: Enhancing Vision-Language Models with Generative Vision
  Encoder and Depth-Breadth Fusion
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen
Jianwei Yang
Haiping Wu
Dianqi Li
Jianfeng Gao
Tianyi Zhou
Bin Xiao
VLM
62
4
0
05 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
79
2
0
04 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
85
2
0
02 Dec 2024
Probabilistic Language-Image Pre-Training
Probabilistic Language-Image Pre-Training
Sanghyuk Chun
Wonjae Kim
Song Park
Sangdoo Yun
MLLM
VLM
CLIP
152
4
2
24 Oct 2024
When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning
When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning
Hao Yan
C. Li
Zhigang Yu
Jun Yin
Ruochen Liu
Peiyan Zhang
Weihao Han
Mingzheng Li
Zhengxin Zeng
34
0
0
11 Oct 2024
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Avik Pal
Max van Spengler
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Fabio Galasso
Pascal Mettes
CoGe
48
5
0
09 Oct 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
139
1
0
04 Sep 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
42
61
0
22 Aug 2024
Open Vocabulary Multi-Label Video Classification
Open Vocabulary Multi-Label Video Classification
Rohit Gupta
Mamshad Nayeem Rizve
Jayakrishnan Unnikrishnan
Ashish Tawari
Son Tran
Mubarak Shah
Benjamin Z. Yao
Trishul Chilimbi
VLM
67
1
0
12 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
66
4
0
09 Jul 2024
MINDECHO: Role-Playing Language Agents for Key Opinion Leaders
MINDECHO: Role-Playing Language Agents for Key Opinion Leaders
Rui Xu
Dakuan Lu
Xiaoyu Tan
Xintao Wang
Siyu Yuan
Jiangjie Chen
Wei Chu
Xu Yinghui
LLMAG
34
3
0
07 Jul 2024
Stark: Social Long-Term Multi-Modal Conversation with Persona
  Commonsense Knowledge
Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge
Young-Jun Lee
Dokyong Lee
Junyoung Youn
Kyeongjin Oh
ByungSoo Ko
Jonghwan Hyeon
Ho-Jin Choi
36
2
0
04 Jul 2024
Semantic Compositions Enhance Vision-Language Contrastive Learning
Semantic Compositions Enhance Vision-Language Contrastive Learning
Maxwell Mbabilla Aladago
Lorenzo Torresani
Soroush Vosoughi
CoGe
VLM
CLIP
41
0
0
01 Jul 2024
Curriculum Learning with Quality-Driven Data Selection
Curriculum Learning with Quality-Driven Data Selection
Biao Wu
Fang Meng
Ling-Hao Chen
34
2
0
27 Jun 2024
From Pixels to Prose: A Large Dataset of Dense Image Captions
From Pixels to Prose: A Large Dataset of Dense Image Captions
Vasu Singla
Kaiyu Yue
Sukriti Paul
Reza Shirkavand
Mayuka Jayawardhana
Alireza Ganjdanesh
Heng Huang
A. Bhatele
Gowthami Somepalli
Tom Goldstein
3DV
VLM
36
22
0
14 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
44
35
0
12 Jun 2024
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for
  Transferable Insights
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
38
3
0
31 May 2024
Evaluating Vision-Language Models on Bistable Images
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
49
0
0
29 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
58
36
0
26 May 2024
FFF: Fixing Flawed Foundations in contrastive pre-training results in
  very strong Vision-Language models
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
Adrian Bulat
Yassine Ouali
Georgios Tzimiropoulos
VLM
45
4
0
16 May 2024
Mind the Gap Between Synthetic and Real: Utilizing Transfer Learning to
  Probe the Boundaries of Stable Diffusion Generated Data
Mind the Gap Between Synthetic and Real: Utilizing Transfer Learning to Probe the Boundaries of Stable Diffusion Generated Data
Leonhard Hennicke
C. Adriano
Holger Giese
Jan Mathias Koehler
Lukas Schott
DiffM
55
2
0
06 May 2024
Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval
Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval
Jiacheng Cheng
Hijung Valentina Shin
Nuno Vasconcelos
Bryan C. Russell
Fabian Caba Heilbron
VLM
31
1
0
06 May 2024
What matters when building vision-language models?
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
43
157
0
03 May 2024
DOCCI: Descriptions of Connected and Contrasting Images
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe
Sunayana Rane
Zachary Berger
Yonatan Bitton
Jaemin Cho
...
Zarana Parekh
Jordi Pont-Tuset
Garrett Tanzer
Su Wang
Jason Baldridge
41
48
0
30 Apr 2024
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and
  Texts
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim
Sanghyuk Chun
Taekyung Kim
Dongyoon Han
Sangdoo Yun
47
7
0
26 Apr 2024
Vocabulary-free Image Classification and Semantic Segmentation
Vocabulary-free Image Classification and Semantic Segmentation
Alessandro Conti
Enrico Fini
Massimiliano Mancini
Paolo Rota
Yiming Wang
Elisa Ricci
VLM
43
2
0
16 Apr 2024
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Tianyu Zhu
M. Jung
Jesse Clark
91
1
0
12 Apr 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via
  Negations
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh
Ishaan Shrivastava
Mayank Vatsa
Richa Singh
Aparna Bharati
VLM
CoGe
34
14
0
29 Mar 2024
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
Donghyun Kim
Byeongho Heo
Dongyoon Han
50
14
0
28 Mar 2024
Ultra Low-Cost Two-Stage Multimodal System for Non-Normative Behavior
  Detection
Ultra Low-Cost Two-Stage Multimodal System for Non-Normative Behavior Detection
Albert Lu
Stephen Cranefield
32
0
0
24 Mar 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLM
AI4TS
58
4
0
21 Mar 2024
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for
  Remote Sensing Image-Text Retrival
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival
Yuanxin Zhao
Mi Zhang
Bingnan Yang
Zhan Zhang
Jiaju Kang
Jianya Gong
35
2
0
16 Mar 2024
A Decade's Battle on Dataset Bias: Are We There Yet?
A Decade's Battle on Dataset Bias: Are We There Yet?
Zhuang Liu
Kaiming He
42
28
0
13 Mar 2024
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in
  Images and Videos
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
Tarun Kalluri
Bodhisattwa Prasad Majumder
Manmohan Chandraker
VLM
37
4
0
08 Mar 2024
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Haocheng Han
Qinghua Zheng
Guangwen Dai
Minnan Luo
Jingdong Wang
32
5
0
08 Mar 2024
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Pu Cao
Feng Zhou
Qing-Huang Song
Lu Yang
72
37
0
07 Mar 2024
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary
  Semantic Segmentation from Text Supervision
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision
Yajie Liu
Pu Ge
Qingjie Liu
Di Huang
75
2
0
06 Mar 2024
Approximate Nearest Neighbor Search with Window Filters
Approximate Nearest Neighbor Search with Window Filters
Joshua Engels
Benjamin Landrum
Shangdi Yu
Laxman Dhulipala
Julian Shun
16
7
0
01 Feb 2024
Exploring Simple Open-Vocabulary Semantic Segmentation
Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
VLM
21
0
0
22 Jan 2024
CLIP Model for Images to Textual Prompts Based on Top-k Neighbors
CLIP Model for Images to Textual Prompts Based on Top-k Neighbors
Xin Zhang
Xin Zhang
Yeming Cai
Tianzhi Jia
VLM
25
0
0
18 Jan 2024
123
Next