ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.15378
  4. Cited By
Long-CLIP: Unlocking the Long-Text Capability of CLIP

Long-CLIP: Unlocking the Long-Text Capability of CLIP

22 March 2024
Beichen Zhang
Pan Zhang
Xiao-wen Dong
Yuhang Zang
Jiaqi Wang
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

50 / 92 papers shown
Title
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Raghuveer Thirukovalluru
Rui Meng
Yong-Jin Liu
K. Krishnamoorthi
Mingyi Su
Ping Nie
Semih Yavuz
Yingbo Zhou
Wenhu Chen
Bhuwan Dhingra
17
0
0
16 May 2025
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Yifan Wu
Lutao Yan
Yizhang Zhu
Yinan Mei
Jiannan Wang
Nan Tang
Yuyu Luo
22
0
0
15 May 2025
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
F. Khan
Jun Chen
Youssef Mohamed
Chun-Mei Feng
Mohamed Elhoseiny
VLM
33
0
0
08 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
53
0
0
08 May 2025
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin
Zhu Xu
Yang Liu
24
0
0
07 May 2025
Robust Fairness Vision-Language Learning for Medical Image Analysis
Robust Fairness Vision-Language Learning for Medical Image Analysis
Sparsh Bansal
Mingyang Wu
Xin Wang
S. Hu
VLM
50
0
0
06 May 2025
Improving Editability in Image Generation with Layer-wise Memory
Improving Editability in Image Generation with Layer-wise Memory
Daneul Kim
Jaeah Lee
Jaesik Park
DiffM
KELM
60
0
0
02 May 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Tiancheng Gu
Kaicheng Yang
Ziyong Feng
Xingjun Wang
Yanzhao Zhang
Dingkun Long
Yingda Chen
Weidong Cai
Jiankang Deng
VLM
170
1
0
24 Apr 2025
Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge
Improving Multimodal Hateful Meme Detection Exploiting LMM-Generated Knowledge
Maria Tzelepi
Vasileios Mezaris
34
0
0
14 Apr 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
39
0
0
09 Apr 2025
Dynamic Objective MPC for Motion Planning of Seamless Docking Maneuvers
Dynamic Objective MPC for Motion Planning of Seamless Docking Maneuvers
Oliver Schumann
Michael Buchholz
Klaus C. J. Dietmayer
40
0
0
04 Apr 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
35
0
0
03 Apr 2025
Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments
Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments
Yifan Xu
V. Kamat
Carol Menassa
51
0
0
29 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Yixuan Wang
46
0
0
28 Mar 2025
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Weizhi Chen
Jingbo Chen
Yupeng Deng
Jiansheng Chen
Yuman Feng
Zhihao Xi
Diyou Liu
Kai Li
Yu Meng
VLM
51
0
0
25 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
50
0
0
24 Mar 2025
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Y. Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
48
0
0
24 Mar 2025
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
Ketan Suhaas Saichandran
Xavier Thomas
Prakhar Kaushik
Deepti Ghadiyaram
DiffM
78
0
0
22 Mar 2025
GOAL: Global-local Object Alignment Learning
GOAL: Global-local Object Alignment Learning
Hyungyu Choi
Young Kyun Jang
Chanho Eom
VLM
130
0
0
22 Mar 2025
Meme Similarity and Emotion Detection using Multimodal Analysis
Meme Similarity and Emotion Detection using Multimodal Analysis
Aidos Konyspay
Pakizar Shamoi
Malika Ziyada
Zhusup Smambayev
51
0
0
21 Mar 2025
HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation
HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation
Hou In Derek Pun
Hou In Ivan Tam
Austin T. Wang
Xiaoliang Huo
Angel X. Chang
Manolis Savva
3DV
48
1
0
21 Mar 2025
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images
Leyang Wang
Joice Lin
DiffM
63
0
0
20 Mar 2025
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
Hou In Ivan Tam
Hou In Derek Pun
Austin T. Wang
Angel X. Chang
Manolis Savva
62
1
0
18 Mar 2025
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Weixiong Lin
Chen Ju
Haicheng Wang
Shengchao Hu
Shuai Xiao
...
Yuheng Jiao
Mingshuai Yao
Jinsong Lan
Qingwen Liu
Ying Chen
52
0
0
18 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
James Thorne
49
0
0
17 Mar 2025
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
Zilun Zhang
Haozhan Shen
Tiancheng Zhao
Bin Chen
Zian Guan
Yuhao Wang
Xu Jia
Yuxiang Cai
Yongheng Shang
Jianwei Yin
54
0
0
16 Mar 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Kanzhi Cheng
Wenpo Song
Jiaxin Fan
Zheng Ma
Qiushi Sun
Fangzhi Xu
Chenyang Yan
Nuo Chen
Jianbing Zhang
Jiajun Chen
MLLM
VLM
55
1
0
16 Mar 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen
Zhengrong Yue
Siran Chen
Zehua Wang
Yang Liu
Peng Li
Yixuan Wang
VLM
160
0
0
13 Mar 2025
LongProLIP: A Probabilistic Vision-Language Model with Long Context Text
Sanghyuk Chun
Sangdoo Yun
VLM
48
1
0
11 Mar 2025
Fine-Tuning Florence2 for Enhanced Object Detection in Un-constructed Environments: Vision-Language Model Approach
Fine-Tuning Florence2 for Enhanced Object Detection in Un-constructed Environments: Vision-Language Model Approach
Soumyadeep Ro
Sanapala Satwika
Pamarthi Yasoda Gayathri
Mohmmad Ghaith Balsha
Aysegul Ucar
VLM
ObjD
56
0
0
06 Mar 2025
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
Peishan Cong
Ziyi Wang
Y. Ma
Xiangyu Yue
51
1
0
03 Mar 2025
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar
Enzhuo Zhang
Zhenshi Li
Feng-Xue Gu
Yanglangxing He
P. Xiao
Xueliang Zhang
36
2
0
02 Mar 2025
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
Pengzhi Li
Pengfei Yu
Zide Liu
Wei He
Xuhao Pan
Xudong Rao
Tao Wei
Wei Chen
VLM
60
0
0
25 Feb 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin
Zehao Xiao
Pan Zhou
Shujian Yu
Jiayi Shen
J. Sonke
E. Gavves
37
0
0
24 Feb 2025
Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
Yizhuo Lu
Changde Du
Chong Wang
Xuanliu Zhu
Liuyun Jiang
Xujin Li
Huiguang He
VGen
125
4
0
20 Feb 2025
Can LVLMs and Automatic Metrics Capture Underlying Preferences of Blind and Low-Vision Individuals for Navigational Aid?
Can LVLMs and Automatic Metrics Capture Underlying Preferences of Blind and Low-Vision Individuals for Navigational Aid?
Na Min An
Eunki Kim
Wan Ju Kang
Sangryul Kim
Hyunjung Shim
James Thorne
41
0
0
15 Feb 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
89
19
0
21 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
W. Zhang
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
74
19
0
21 Jan 2025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
77
25
0
31 Dec 2024
Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation
Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation
Nadav Cohen
O. Nir
Ariel Shamir
DiffM
33
1
0
31 Dec 2024
CharGen: High Accurate Character-Level Visual Text Generation Model with
  MultiModal Encoder
CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
Lichen Ma
Tiezhu Yue
Pei Fu
Yujie Zhong
Kai Zhou
Xiaoming Wei
Jie Hu
DiffM
78
2
0
23 Dec 2024
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Junyan Ye
Honglin Lin
Leyan Ou
Dairong Chen
Zihao Wang
Conghui He
Weijia Li
Weijia Li
76
0
0
22 Dec 2024
Can video generation replace cinematographers? Research on the cinematic language of generated video
Can video generation replace cinematographers? Research on the cinematic language of generated video
Xiaomeng Li
Kai WU
Siyi Yang
YiZhan Qu
Guohua. Zhang
...
Mingliang Xiong
Hao Deng
Qingwen Liu
Gang Li
Bin He
VGen
DiffM
90
1
0
16 Dec 2024
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for
  Training-Free Zero-Shot Composed Image Retrieval
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang
Xiaoting Qin
Jingyang Zhang
Jing Yu
Gaopeng Gou
Gang Xiong
Qingwei Ling
Saravan Rajmohan
Dongmei Zhang
Qi Wu
LRM
66
1
0
15 Dec 2024
CATALOG: A Camera Trap Language-guided Contrastive Learning Model
CATALOG: A Camera Trap Language-guided Contrastive Learning Model
Julian D. Santamaria
Claudia Isaza
Jhony H. Giraldo
81
0
0
14 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xiaotian Zhang
K. Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
84
12
0
12 Dec 2024
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
107
7
0
11 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
79
2
0
04 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
85
2
0
02 Dec 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image
  Pre-training
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang
Chen Ju
Weixiong Lin
Shuai Xiao
Mengting Chen
...
Mingshuai Yao
Jinsong Lan
Ying Chen
Qingwen Liu
Yanfeng Wang
VLM
CLIP
75
4
0
30 Nov 2024
12
Next