ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.15378
  4. Cited By
Long-CLIP: Unlocking the Long-Text Capability of CLIP

Long-CLIP: Unlocking the Long-Text Capability of CLIP

22 March 2024
Beichen Zhang
Pan Zhang
Xiao-wen Dong
Yuhang Zang
Jiaqi Wang
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

42 / 92 papers shown
Title
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal
  Large Language Models
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim
Hyungjin Chung
Byung-Hoon Kim
VLM
34
0
0
11 Nov 2024
Probabilistic Language-Image Pre-Training
Probabilistic Language-Image Pre-Training
Sanghyuk Chun
Wonjae Kim
Song Park
Sangdoo Yun
MLLM
VLM
CLIP
140
4
2
24 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Zeang Sheng
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
50
28
0
22 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
71
21
0
18 Oct 2024
Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?
Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?
Che Liu
Zhongwei Wan
Haozhe Wang
Yinda Chen
T. Qaiser
Chen Jin
Fariba Yousefi
Nikolay Burlutskiy
Rossella Arcucci
VLM
SyDa
LM&MA
MedIm
69
2
0
17 Oct 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Beyond Coarse-Grained Matching in Video-Text Retrieval
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
36
0
0
16 Oct 2024
TULIP: Token-length Upgraded CLIP
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
Nanne van Noord
Marcel Worring
Cees G. M. Snoek
VLM
48
3
0
13 Oct 2024
HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with
  Heterogeneous Graph Adapter
HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Graph Adapter
Yumiao Zhao
Bo Jiang
Xiao Wang
Qin Xu
Jin Tang
VLM
33
0
0
10 Oct 2024
Personalized Visual Instruction Tuning
Personalized Visual Instruction Tuning
Renjie Pi
Jianshu Zhang
Tianyang Han
Jipeng Zhang
Rui Pan
Tong Zhang
MLLM
34
6
0
09 Oct 2024
Unsupervised Model Diagnosis
Unsupervised Model Diagnosis
Yinong Wang
Eileen Li
Jinqi Luo
Zhaoning Wang
Fernando de la Torre
AAML
32
1
0
08 Oct 2024
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
Jiazi Bu
Pengyang Ling
Pan Zhang
Tong Wu
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
DiffM
VGen
33
0
0
08 Oct 2024
LoTLIP: Improving Language-Image Pre-training for Long Text
  Understanding
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Wei Wu
Kecheng Zheng
Shuailei Ma
Fan Lu
Yuxin Guo
Yifei Zhang
Wei Chen
Qingpei Guo
Yujun Shen
Zheng-Jun Zha
VLM
32
9
0
07 Oct 2024
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal
  Foundation Models
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai
Vasileios Saveris
Chia-Ju Chen
Hong-You Chen
Haotian Zhang
...
Wenze Hu
Zhe Gan
Peter Grasch
Meng Cao
Yinfei Yang
VLM
38
3
0
03 Oct 2024
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified
  Multiplet Upcycling
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Jihai Zhang
Xiaoye Qu
Tong Zhu
Yu Cheng
41
7
0
28 Sep 2024
OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot Social Navigation
OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot Social Navigation
Siddarth Narasimhan
Aaron Hao Tan
Daniel Choi
G. Nejat
LM&Ro
38
3
0
20 Sep 2024
BrainDecoder: Style-Based Visual Decoding of EEG Signals
BrainDecoder: Style-Based Visual Decoding of EEG Signals
Minsuk Choi
Hiroshi Ishikawa
23
0
0
09 Sep 2024
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
Qian Zhang
Xiangzi Dai
Ninghua Yang
Xiang An
Ziyong Feng
Xingyu Ren
VLM
CLIP
43
17
0
02 Aug 2024
Diffusion Feedback Helps CLIP See Better
Diffusion Feedback Helps CLIP See Better
Wenxuan Wang
Quan-Sen Sun
Fan Zhang
Yepeng Tang
Jing Liu
Xinlong Wang
VLM
46
14
0
29 Jul 2024
DiffX: Guide Your Layout to Cross-Modal Generative Modeling
DiffX: Guide Your Layout to Cross-Modal Generative Modeling
Zeyu Wang
Jingyu Lin
Yifei Qian
Yi Huang
Shicen Tian
...
Qu Yang
Lan Du
Cunjian Chen
Yufei Guo
Kejie Huang
DiffM
VLM
28
2
0
22 Jul 2024
E5-V: Universal Embeddings with Multimodal Large Language Models
E5-V: Universal Embeddings with Multimodal Large Language Models
Ting Jiang
Minghui Song
Zihan Zhang
Haizhen Huang
Weiwei Deng
Feng Sun
Qi Zhang
Deqing Wang
Fuzhen Zhuang
VLM
33
21
0
17 Jul 2024
HiLight: Technical Report on the Motern AI Video Language Model
HiLight: Technical Report on the Motern AI Video Language Model
Zhiting Wang
Qiangong Zhou
Kangjie Yang
Zongyang Liu
Xin Mao
37
0
0
10 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
66
4
0
09 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
  Supporting Long-Contextual Input and Output
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
45
100
0
03 Jul 2024
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression
  Recognition with AdaptERs
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs
Haodong Chen
Haojian Huang
Junhao Dong
Mingzhe Zheng
Dian Shao
45
15
0
02 Jul 2024
Embodied Instruction Following in Unknown Environments
Embodied Instruction Following in Unknown Environments
Zhenyu Wu
Ziwei Wang
Xiuwei Xu
Jiwen Lu
Haibin Yan
LM&Ro
33
4
0
17 Jun 2024
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object
  Detection: Methods and Results
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Jiaqi Wang
Yuhang Zang
Pan Zhang
Tao Chu
Yuhang Cao
...
Kehong Yuan
Yanyan Zu
Jiayao Ha
Qiong Gao
Licheng Jiao
ObjD
55
1
0
17 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
44
35
0
12 Jun 2024
MotionClone: Training-Free Motion Cloning for Controllable Video
  Generation
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Pengyang Ling
Jiazi Bu
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Tong Wu
H. Chen
Jiaqi Wang
Yi Jin
VGen
DiffM
33
34
0
08 Jun 2024
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina
Adriana Valentina Costache
Adrian-Gabriel Chifu
Josiane Mothe
Radu Tudor Ionescu
VLM
58
1
0
07 Jun 2024
Unleashing Generalization of End-to-End Autonomous Driving with
  Controllable Long Video Generation
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
Enhui Ma
Lijun Zhou
Tao Tang
Zhan Zhang
Dong Han
...
Peng Jia
Xianpeng Lang
Haiyang Sun
Di Lin
Kaicheng Yu
VGen
37
20
0
03 Jun 2024
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Andreas Koukounas
Georgios Mastrapas
Michael Gunther
Bo Wang
Scott Martens
...
Saahil Ognawala
Susana Guzman
Maximilian Werk
Nan Wang
Han Xiao
VLM
27
16
0
30 May 2024
PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt
  Condition
PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition
Xi Fang
Weigang Wang
Xiaoxin Lv
Jun Yan
EGVM
42
3
0
20 Apr 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
  Handling Resolutions from 336 Pixels to 4K HD
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Xingcheng Zhang
Jifeng Dai
Yuxin Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
41
114
0
09 Apr 2024
AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment
AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment
Chunyi Li
Tengchuan Kou
Yixuan Gao
Yuhang Cao
Wei Sun
...
Weixia Zhang
Haoning Wu
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
41
17
0
04 Apr 2024
Faster Diffusion via Temporal Attention Decomposition
Faster Diffusion via Temporal Attention Decomposition
Haozhe Liu
Wentian Zhang
Jinheng Xie
Francesco Faccio
Mengmeng Xu
Tao Xiang
Mike Zheng Shou
Juan-Manuel Perez-Rua
Jürgen Schmidhuber
DiffM
75
19
0
03 Apr 2024
Learning the Unlearned: Mitigating Feature Suppression in Contrastive
  Learning
Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
Jihai Zhang
Xiang Lan
Xiaoye Qu
Yu Cheng
Mengling Feng
Bryan Hooi
SSL
24
4
0
19 Feb 2024
Vision Language Models in Autonomous Driving: A Survey and Outlook
Vision Language Models in Autonomous Driving: A Survey and Outlook
Xingcheng Zhou
Mingyu Liu
Ekim Yurtsever
B. L. Žagar
Walter Zimmer
Hu Cao
Alois C. Knoll
VLM
34
36
0
22 Oct 2023
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
192
499
0
22 Feb 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
259
558
0
28 Sep 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
225
899
0
28 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
326
780
0
18 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
296
1,084
0
17 Feb 2021
Previous
12