Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.07852
Cited By
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
16 July 2022
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval"
50 / 79 papers shown
Title
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval
Xiaolun Jing
Genke Yang
Jian Chu
26
0
0
07 Apr 2025
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
36
0
0
03 Apr 2025
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
33
0
0
02 Apr 2025
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun V. Reddy
Alexander Martin
Eugene Yang
Andrew Yates
Kate Sanders
Kenton W. Murray
Reno Kriz
Celso M. De Melo
Benjamin Van Durme
Rama Chellappa
50
1
0
24 Mar 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
50
1
0
21 Mar 2025
TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data
Rohit Kundu
Athula Balachandran
A. Roy-Chowdhury
45
0
0
20 Mar 2025
Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing
Zecheng Zhao
Zhi Chen
Zi-Rui Huang
S. Sadiq
Tong Chen
36
0
0
13 Mar 2025
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
Zengrong Lin
Zheng Wang
Tianwen Qian
Pan Mu
Sixian Chan
Cong Bai
52
0
0
13 Mar 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
H. Li
Li Yuan
Shuicheng Yan
Jie Chen
52
1
0
31 Dec 2024
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
37
1
0
31 Dec 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
71
9
0
31 Oct 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
32
0
0
16 Oct 2024
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
Bingqing Zhang
Zhuo Cao
Heming Du
Xin Yu
Xue Li
Jiajun Liu
Sen Wang
VGen
25
0
0
30 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
132
7
0
02 Sep 2024
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
Bin Wang
Wenqian Wang
VLM
31
1
0
20 Aug 2024
From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification
Fanzhi Jiang
Su Yang
Mark W. Jones
Liumei Zhang
52
1
0
31 Jul 2024
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
Longtao Jiang
Min Wang
Zecheng Li
Yao Fang
Wen-gang Zhou
Houqiang Li
SLR
31
2
0
23 Jul 2024
Streaming Video Diffusion: Online Video Editing with Diffusion Models
Feng Chen
Zhen Yang
Bohan Zhuang
Qi Wu
DiffM
49
4
0
30 May 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
98
11
0
29 May 2024
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Haonan Zhang
Pengpeng Zeng
Lianli Gao
Jingkuan Song
Yihang Duan
Xinyu Lyu
Hengtao Shen
VLM
CLIP
37
2
0
21 May 2024
Learning text-to-video retrieval from image captioning
Lucas Ventura
Cordelia Schmid
Gül Varol
3DV
36
3
0
26 Apr 2024
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
Xuzheng Yu
Chen Jiang
Xingning Dong
Tian Gan
Ming Yang
Qingpei Guo
45
1
0
22 Apr 2024
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
Han Fang
Xianghao Zang
Chao Ban
Zerun Feng
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
32
1
0
18 Apr 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
55
20
0
26 Mar 2024
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar
Muzammal Naseer
Rao Muhammad Anwer
Salman Khan
M. Felsberg
Mubarak Shah
Fahad Shahbaz Khan
27
7
0
25 Mar 2024
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul M. Chilimbi
VLM
AI4TS
52
4
0
21 Mar 2024
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang
Yuhao Wang
Yang Liu
Zhengzheng Tu
Huchuan Lu
23
21
0
15 Mar 2024
Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Shih-Han Chou
Matthew Kowal
Yasmin Niknam
Diana Moyano
Shayaan Mehdi
...
Cheng Zhang
Ian Knopke
S. Kocak
Leonid Sigal
Yalda Mohsenzadeh
38
1
0
23 Jan 2024
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
28
23
0
19 Jan 2024
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
Kaibin Tian
Yanhua Cheng
Yi Liu
Xinglin Hou
Quan Chen
Han Li
22
3
0
01 Jan 2024
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge
Huy Le
Tung Kieu
Anh Nguyen
Ngan Le
VGen
26
1
0
15 Dec 2023
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang
Junbang Liang
Chun-Kai Wang
Kenan Deng
Yu Lou
Ming-Chyuan Lin
Shan Yang
27
13
0
13 Dec 2023
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
28
3
0
11 Dec 2023
DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models
Shaoan Xie
Yang Zhao
Zhisheng Xiao
Kelvin C. K. Chan
Yandong Li
Yanwu Xu
Kun Zhang
Tingbo Hou
DiffM
28
26
0
05 Dec 2023
RTQ: Rethinking Video-language Understanding Based on Image-text Model
Xiao Wang
Yaoyu Li
Tian Gan
Zheng Zhang
Jingjing Lv
Liqiang Nie
11
6
0
01 Dec 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
92
9
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
29
3
0
25 Nov 2023
Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval
Junkyu Jang
Eugene Hwang
Sung-Hyuk Park
25
0
0
03 Nov 2023
An Empirical Study of Frame Selection for Text-to-Video Retrieval
Mengxia Wu
Min Cao
Yang Bai
Ziyin Zeng
Chen Chen
Liqiang Nie
Min Zhang
23
3
0
01 Nov 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Yimu Wang
Xiangru Jian
Bo Xue
22
9
0
17 Oct 2023
Latent Wander: an Alternative Interface for Interactive and Serendipitous Discovery of Large AV Archives
Yuchen Yang
Linyida Zhang
19
2
0
09 Oct 2023
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
Chen Jiang
Hong Liu
Xuzheng Yu
Qing Wang
Yuan-Chia Cheng
...
Zhongyi Liu
Qingpei Guo
Wei Chu
Ming Yang
Yuan Qi
23
10
0
20 Sep 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Mohit Bansal
98
44
0
18 Sep 2023
Decompose Semantic Shifts for Composed Image Retrieval
Xingyu Yang
Daqing Liu
Heng Zhang
Yong Luo
Chaoyue Wang
Jing Zhang
29
2
0
18 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
23
3
0
16 Sep 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
22
26
0
28 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLM
CLIP
43
29
0
15 Aug 2023
Cross-Domain Product Representation Learning for Rich-Content E-Commerce
Xuehan Bai
Yan Li
Yong Cheng
Wenjie Yang
Quanming Chen
Han Li
19
3
0
10 Aug 2023
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
Kaibin Tian
Rui Zhao
Hu Hu
Runquan Xie
Fengzong Lian
Zhanhui Kang
Xirong Li
CLIP
27
0
0
02 Aug 2023
1
2
Next