ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.09046
  4. Cited By
A Hierarchical Multi-Modal Encoder for Moment Localization in Video
  Corpus

A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus

18 November 2020
Bowen Zhang
Hexiang Hu
Joonseok Lee
Mingde Zhao
Sheide Chammas
Vihan Jain
Eugene Ie
Fei Sha
ArXivPDFHTML

Papers citing "A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus"

27 / 27 papers shown
Title
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Sunjae Yoon
Jiajing Hong
Eunseop Yoon
Dahyun Kim
Junyeong Kim
Hee Suk Yoon
Changdong Yoo
82
21
0
17 Oct 2022
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
85
496
0
01 May 2020
Entities as Experts: Sparse Memory Access with Entity Supervision
Entities as Experts: Sparse Memory Access with Entity Supervision
Thibault Févry
Livio Baldini Soares
Nicholas FitzGerald
Eunsol Choi
Tom Kwiatkowski
RALM
58
153
0
15 Apr 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
172
279
0
24 Jan 2020
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding
  in Videos
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
Yitian Yuan
Lin Ma
Jingwen Wang
Wei Liu
Wenwu Zhu
59
243
0
31 Oct 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
188
3,659
0
06 Aug 2019
SpanBERT: Improving Pre-training by Representing and Predicting Spans
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Mandar Joshi
Danqi Chen
Yinhan Liu
Daniel S. Weld
Luke Zettlemoyer
Omer Levy
104
1,953
0
24 Jul 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
35
1,238
0
03 Apr 2019
Cross-Modal and Hierarchical Modeling of Video and Text
Cross-Modal and Hierarchical Modeling of Video and Text
Bowen Zhang
Hexiang Hu
Fei Sha
BDL
AI4TS
32
189
0
16 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
862
93,936
0
11 Oct 2018
Localizing Moments in Video with Temporal Language
Localizing Moments in Video with Temporal Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
47
158
0
05 Sep 2018
To Find Where You Talk: Temporal Sentence Localization in Video with
  Attention Based Location Regression
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression
Yitian Yuan
Tao Mei
Wenwu Zhu
59
332
0
19 Apr 2018
Multilevel Language and Vision Integration for Text-to-Clip Retrieval
Multilevel Language and Vision Integration for Text-to-Clip Retrieval
Huijuan Xu
Kun He
Bryan A. Plummer
Leonid Sigal
Stan Sclaroff
Kate Saenko
CLIP
43
322
0
13 Apr 2018
Localizing Moments in Video with Natural Language
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
87
933
0
04 Aug 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
360
129,831
0
12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
192
7,961
0
22 May 2017
The Kinetics Human Action Video Dataset
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
193
3,771
0
19 May 2017
TALL: Temporal Activity Localization via Language Query
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
96
804
0
05 May 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
118
1,225
0
02 May 2017
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
796
6,768
0
26 Sep 2016
Word2VisualVec: Image and Video to Sentence Matching by Visual Feature
  Prediction
Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction
Jianfeng Dong
Xirong Li
Cees G. M. Snoek
3DV
26
35
0
23 Apr 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.2K
192,638
0
10 Dec 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language
Jointly Modeling Embedding and Translation to Bridge Video and Language
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Y. Rui
60
534
0
07 May 2015
Sequence to Sequence -- Video to Text
Sequence to Sequence -- Video to Text
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
91
1,417
0
03 May 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
673
149,474
0
22 Dec 2014
Translating Videos to Natural Language Using Deep Recurrent Neural
  Networks
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
83
951
0
15 Dec 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language
  Models
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
VLM
75
1,395
0
10 Nov 2014
1