Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.07852
Cited By
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
16 July 2022
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval"
29 / 79 papers shown
Title
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
46
14
0
24 Jul 2023
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Qi Zhang
S. Zheng
Qin Jin
17
1
0
20 Jul 2023
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Zuozhuo Dai
Fang Shao
Qingkun Su
Zilong Dong
Siyu Zhu
167
1
0
14 Jul 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
42
3
0
28 Jun 2023
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
Jiaheng Liu
Jiashi Feng
VLM
CLIP
23
17
0
22 May 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Peng Jin
Hao Li
Ze-Long Cheng
Jinfa Huang
Zhennan Wang
Li-ming Yuan
Chang-rui Liu
Jie Chen
33
31
0
20 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
34
2
0
13 May 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Joey Tianyi Zhou
45
129
0
11 May 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
31
102
0
17 Apr 2023
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
Yuanhao Xiong
Long Zhao
Boqing Gong
Ming-Hsuan Yang
Florian Schroff
Ting Liu
Cho-Jui Hsieh
Liangzhe Yuan
VLM
32
0
0
28 Mar 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
42
49
0
25 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffM
VGen
28
54
0
17 Mar 2023
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling
Jiaqi Xu
Bo Liu
Yunkuo Chen
Mengli Cheng
Xing Shi
43
1
0
10 Mar 2023
CLIP-guided Prototype Modulating for Few-shot Action Recognition
Xiang Wang
Shiwei Zhang
Jun Cen
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
VLM
21
53
0
06 Mar 2023
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Yimu Wang
Peng Shi
22
5
0
19 Feb 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
26
45
0
16 Jan 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
Xu Yang
Zhang Li
Haiyang Xu
Hanwang Zhang
Qinghao Ye
Chenliang Li
Ming Yan
Yu Zhang
Fei Huang
Songfang Huang
30
7
0
05 Jan 2023
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
Yue Ma
Tianyu Yang
Yin Shan
Xiu Li
41
27
0
07 Dec 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
55
309
0
06 Dec 2022
VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval
Siteng Huang
Biao Gong
Yulin Pan
Jianwen Jiang
Yiliang Lv
Yuyuan Li
Donglin Wang
VLM
VPVLM
22
41
0
23 Nov 2022
Cross-Modal Adapter for Text-Video Retrieval
Haojun Jiang
Jianke Zhang
Rui Huang
Chunjiang Ge
Zanlin Ni
Jiwen Lu
Jie Zhou
S. Song
Gao Huang
48
36
0
17 Nov 2022
CLIP-Driven Fine-grained Text-Image Person Re-identification
Shuanglin Yan
Neng Dong
Liyan Zhang
Jinhui Tang
39
87
0
19 Oct 2022
Efficient Cross-Modal Video Retrieval with Meta-Optimized Frames
Ning Han
Xun Yang
Ee-Peng Lim
Hao Chen
Qianru Sun
48
3
0
16 Oct 2022
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
44
3
0
24 Aug 2022
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
167
170
0
20 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
317
780
0
18 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,982
0
09 Feb 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
424
596
0
21 Jul 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Previous
1
2