BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video
Retrieval

BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval

29 October 2021

Papers citing "BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval"

17 / 17 papers shown

Title
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval A. Fragomeni Dima Damen Michael Wray 68 0 0 02 Apr 2025
Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding Xuejing Liu Liang Li Shuhui Wang Zhengjun Zha Dechao Meng Qi Tian Qingming Huang 55 61 0 18 Jul 2022
Token Shift Transformer for Video Classification Hao Zhang Y. Hao Chong-Wah Ngo ViT 58 117 0 05 Aug 2021
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP Han Fang Pengfei Xiong Luhui Xu Yu Chen CLIP VLM 70 294 0 21 Jun 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval Xiaohan Wang Linchao Zhu Yi Yang 170 170 0 20 Apr 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling Jie Lei Linjie Li Luowei Zhou Zhe Gan Tamara L. Berg Joey Tianyi Zhou Jingjing Liu CLIP 89 651 0 11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 303 2,016 0 09 Feb 2021
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko Angie Boggust David Harwath Brian Chen D. Joshi ... Rogerio Feris Brian Kingsbury M. Picheny Antonio Torralba James R. Glass SSL 37 142 0 16 Jun 2020
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Shizhe Chen Yida Zhao Qin Jin Qi Wu 63 311 0 01 Mar 2020
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Antoine Miech Dimitri Zhukov Jean-Baptiste Alayrac Makarand Tapaswi Ivan Laptev Josef Sivic VGen 83 1,186 0 07 Jun 2019
Learning Actor Relation Graphs for Group Activity Recognition Jianchao Wu Limin Wang Li Wang Jie Guo Gangshan Wu 51 241 0 23 Apr 2019
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Sijie Yan Yuanjun Xiong Dahua Lin GNN 188 4,124 0 23 Jan 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould Lei Zhang AIMat 95 4,201 0 25 Jul 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 182 7,961 0 22 May 2017
Enhancing Person Re-identification in a Self-trained Subspace Xun Yang Meng Wang Richang Hong Q. Tian Yong Rui 69 85 0 20 Apr 2017
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 199 10,412 0 21 Jul 2016
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy Sergey Ioffe Vincent Vanhoucke Alexander A. Alemi 256 14,196 0 23 Feb 2016