ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.16111
  4. Cited By
Multi-Scale Temporal Difference Transformer for Video-Text Retrieval

Multi-Scale Temporal Difference Transformer for Video-Text Retrieval

23 June 2024
Ni Wang
Dongliang Liao
Xing Xu
ArXivPDFHTML

Papers citing "Multi-Scale Temporal Difference Transformer for Video-Text Retrieval"

11 / 11 papers shown
Title
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
Han Fang
Pengfei Xiong
Luhui Xu
Yu Chen
CLIP
VLM
75
294
0
21 Jun 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
172
172
0
20 Apr 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
99
651
0
11 Feb 2021
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
504
602
0
21 Jul 2020
Visual Transformers: Token-based Image Representation and Processing for
  Computer Vision
Visual Transformers: Token-based Image Representation and Processing for Computer Vision
Bichen Wu
Chenfeng Xu
Xiaoliang Dai
Alvin Wan
Peizhao Zhang
Zhicheng Yan
Masayoshi Tomizuka
Joseph E. Gonzalez
Kurt Keutzer
Peter Vajda
ViT
87
556
0
05 Jun 2020
Searching Central Difference Convolutional Networks for Face
  Anti-Spoofing
Searching Central Difference Convolutional Networks for Face Anti-Spoofing
Zitong Yu
Chenxu Zhao
Zezheng Wang
Yunxiao Qin
Z. Su
Xiaobai Li
Feng Zhou
Guoying Zhao
CVBM
127
414
0
09 Mar 2020
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen
Yida Zhao
Qin Jin
Qi Wu
74
311
0
01 Mar 2020
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
201
3,659
0
06 Aug 2019
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Yale Song
M. Soleymani
47
242
0
11 Jun 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
91
1,192
0
07 Jun 2019
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.0K
149,474
0
22 Dec 2014
1