ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.06173
  4. Cited By
Improving Interpretable Embeddings for Ad-hoc Video Search with
  Generative Captions and Multi-word Concept Bank

Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank

9 April 2024
Jiaxin Wu
Chong-Wah Ngo
W. Chan
    VGen
ArXiv (abs)PDFHTML

Papers citing "Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank"

11 / 11 papers shown
Title
Interpretable Embedding for Ad-hoc Video Search
Interpretable Embedding for Ad-hoc Video Search
Jiaxin Wu
Chong-Wah Ngo
118
30
0
19 Feb 2024
(Un)likelihood Training for Interpretable Embedding
(Un)likelihood Training for Interpretable Embedding
Jiaxin Wu
Chong-Wah Ngo
W. Chan
Zhijian Hou
56
2
0
01 Jul 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
555
4,409
0
28 Jan 2022
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video
  Retrieval
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
Fan Hu
Aozhu Chen
Ziyu Wang
Fangming Zhou
Jianfeng Dong
Xirong Li
69
31
0
03 Dec 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
163
1,186
0
01 Apr 2021
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen
Yida Zhao
Qin Jin
Qi Wu
96
314
0
01 Mar 2020
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
118
1,207
0
07 Jun 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
101
555
0
06 Apr 2019
Predicting Visual Features from Text for Image and Video Caption
  Retrieval
Predicting Visual Features from Text for Image and Video Caption Retrieval
Jianfeng Dong
Xirong Li
Cees G. M. Snoek
59
224
0
05 Sep 2017
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
221
2,493
0
01 Apr 2015
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIPVGen
160
6,164
0
03 Dec 2012
1