ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.07852
  4. Cited By
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval

TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval

16 July 2022
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
ArXivPDFHTML

Papers citing "TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval"

50 / 79 papers shown
Title
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval
Xiaolun Jing
Genke Yang
Jian Chu
26
0
0
07 Apr 2025
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
36
0
0
03 Apr 2025
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
33
0
0
02 Apr 2025
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun V. Reddy
Alexander Martin
Eugene Yang
Andrew Yates
Kate Sanders
Kenton W. Murray
Reno Kriz
Celso M. De Melo
Benjamin Van Durme
Rama Chellappa
50
1
0
24 Mar 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
50
1
0
21 Mar 2025
TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data
TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data
Rohit Kundu
Athula Balachandran
A. Roy-Chowdhury
45
0
0
20 Mar 2025
Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing
Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing
Zecheng Zhao
Zhi Chen
Zi-Rui Huang
S. Sadiq
Tong Chen
36
0
0
13 Mar 2025
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
Zengrong Lin
Zheng Wang
Tianwen Qian
Pan Mu
Sixian Chan
Cong Bai
52
0
0
13 Mar 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
H. Li
Li Yuan
Shuicheng Yan
Jie Chen
52
1
0
31 Dec 2024
GFG -- Gender-Fair Generation: A CALAMITA Challenge
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
37
1
0
31 Dec 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
71
9
0
31 Oct 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Beyond Coarse-Grained Matching in Video-Text Retrieval
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
32
0
0
16 Oct 2024
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
Bingqing Zhang
Zhuo Cao
Heming Du
Xin Yu
Xue Li
Jiajun Liu
Sen Wang
VGen
25
0
0
30 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
132
7
0
02 Sep 2024
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer
  Learning
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
Bin Wang
Wenqian Wang
VLM
31
1
0
20 Aug 2024
From Attributes to Natural Language: A Survey and Foresight on
  Text-based Person Re-identification
From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification
Fanzhi Jiang
Su Yang
Mark W. Jones
Liumei Zhang
52
1
0
31 Jul 2024
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language
  Retrieval
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
Longtao Jiang
Min Wang
Zecheng Li
Yao Fang
Wen-gang Zhou
Houqiang Li
SLR
31
2
0
23 Jul 2024
Streaming Video Diffusion: Online Video Editing with Diffusion Models
Streaming Video Diffusion: Online Video Editing with Diffusion Models
Feng Chen
Zhen Yang
Bohan Zhuang
Qi Wu
DiffM
49
4
0
30 May 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
98
11
0
29 May 2024
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Haonan Zhang
Pengpeng Zeng
Lianli Gao
Jingkuan Song
Yihang Duan
Xinyu Lyu
Hengtao Shen
VLM
CLIP
37
2
0
21 May 2024
Learning text-to-video retrieval from image captioning
Learning text-to-video retrieval from image captioning
Lucas Ventura
Cordelia Schmid
Gül Varol
3DV
36
3
0
26 Apr 2024
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
Xuzheng Yu
Chen Jiang
Xingning Dong
Tian Gan
Ming Yang
Qingpei Guo
45
1
0
22 Apr 2024
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval
Han Fang
Xianghao Zang
Chao Ban
Zerun Feng
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
32
1
0
18 Apr 2024
Koala: Key frame-conditioned long video-LLM
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
55
20
0
26 Mar 2024
Composed Video Retrieval via Enriched Context and Discriminative
  Embeddings
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar
Muzammal Naseer
Rao Muhammad Anwer
Salman Khan
M. Felsberg
Mubarak Shah
Fahad Shahbaz Khan
27
7
0
25 Mar 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul M. Chilimbi
VLM
AI4TS
52
4
0
21 Mar 2024
Magic Tokens: Select Diverse Tokens for Multi-modal Object
  Re-Identification
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang
Yuhao Wang
Yang Liu
Zhengzheng Tu
Huchuan Lu
23
21
0
15 Mar 2024
Multi-modal News Understanding with Professionally Labelled Videos
  (ReutersViLNews)
Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Shih-Han Chou
Matthew Kowal
Yasmin Niknam
Diana Moyano
Shayaan Mehdi
...
Cheng Zhang
Ian Knopke
S. Kocak
Leonid Sigal
Yalda Mohsenzadeh
38
1
0
23 Jan 2024
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
28
23
0
19 Jan 2024
Towards Efficient and Effective Text-to-Video Retrieval with
  Coarse-to-Fine Visual Representation Learning
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
Kaibin Tian
Yanhua Cheng
Yi Liu
Xinglin Hou
Quan Chen
Han Li
22
3
0
01 Jan 2024
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling
  Vision-Language Models Through Open-Vocabulary Knowledge
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge
Huy Le
Tung Kieu
Anh Nguyen
Ngan Le
VGen
26
1
0
15 Dec 2023
ViLA: Efficient Video-Language Alignment for Video Question Answering
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang
Junbang Liang
Chun-Kai Wang
Kenan Deng
Yu Lou
Ming-Chyuan Lin
Shan Yang
27
13
0
13 Dec 2023
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
28
3
0
11 Dec 2023
DreamInpainter: Text-Guided Subject-Driven Image Inpainting with
  Diffusion Models
DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models
Shaoan Xie
Yang Zhao
Zhisheng Xiao
Kelvin C. K. Chan
Yandong Li
Yanwu Xu
Kun Zhang
Tingbo Hou
DiffM
28
26
0
05 Dec 2023
RTQ: Rethinking Video-language Understanding Based on Image-text Model
RTQ: Rethinking Video-language Understanding Based on Image-text Model
Xiao Wang
Yaoyu Li
Tian Gan
Zheng Zhang
Jingjing Lv
Liqiang Nie
11
6
0
01 Dec 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient
  Image-to-Video Transfer Learning
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
92
9
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
29
3
0
25 Nov 2023
Lost Your Style? Navigating with Semantic-Level Approach for
  Text-to-Outfit Retrieval
Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval
Junkyu Jang
Eugene Hwang
Sung-Hyuk Park
25
0
0
03 Nov 2023
An Empirical Study of Frame Selection for Text-to-Video Retrieval
An Empirical Study of Frame Selection for Text-to-Video Retrieval
Mengxia Wu
Min Cao
Yang Bai
Ziyin Zeng
Chen Chen
Liqiang Nie
Min Zhang
23
3
0
01 Nov 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and
  Gallery Banks
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Yimu Wang
Xiangru Jian
Bo Xue
22
9
0
17 Oct 2023
Latent Wander: an Alternative Interface for Interactive and
  Serendipitous Discovery of Large AV Archives
Latent Wander: an Alternative Interface for Interactive and Serendipitous Discovery of Large AV Archives
Yuchen Yang
Linyida Zhang
19
2
0
09 Oct 2023
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial
  Margin Contrastive Learning
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
Chen Jiang
Hong Liu
Xuzheng Yu
Qing Wang
Yuan-Chia Cheng
...
Zhongyi Liu
Qingpei Guo
Wei Chu
Ming Yang
Yuan Qi
23
10
0
20 Sep 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Mohit Bansal
98
44
0
18 Sep 2023
Decompose Semantic Shifts for Composed Image Retrieval
Decompose Semantic Shifts for Composed Image Retrieval
Xingyu Yang
Daqing Liu
Heng Zhang
Yong Luo
Chaoyue Wang
Jing Zhang
29
2
0
18 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for
  Text-Video Retrieval
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
23
3
0
16 Sep 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
CoVR: Learning Composed Video Retrieval from Web Video Captions
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
22
26
0
28 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLM
CLIP
43
29
0
15 Aug 2023
Cross-Domain Product Representation Learning for Rich-Content E-Commerce
Cross-Domain Product Representation Learning for Rich-Content E-Commerce
Xuehan Bai
Yan Li
Yong Cheng
Wenjie Yang
Quanming Chen
Han Li
19
3
0
10 Aug 2023
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
Kaibin Tian
Rui Zhao
Hu Hu
Runquan Xie
Fengzong Lian
Zhanhui Kang
Xirong Li
CLIP
27
0
0
02 Aug 2023
12
Next