ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.00392
  4. Cited By
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

1 March 2020
Shizhe Chen
Yida Zhao
Qin Jin
Qi Wu
ArXivPDFHTML

Papers citing "Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning"

50 / 163 papers shown
Title
Partially Relevant Video Retrieval
Partially Relevant Video Retrieval
Jianfeng Dong
Xianke Chen
Minsong Zhang
Xun Yang
Shujie Chen
Xirong Li
Xun Wang
24
40
0
26 Aug 2022
Exploring Anchor-based Detection for Ego4D Natural Language Query
Exploring Anchor-based Detection for Ego4D Natural Language Query
S. Zheng
Qi Zhang
Bei Liu
Qingyu Jin
Jianlong Fu
EgoV
16
4
0
10 Aug 2022
Boosting Video-Text Retrieval with Explicit High-Level Semantics
Boosting Video-Text Retrieval with Explicit High-Level Semantics
Haoran Wang
Di Xu
Dongliang He
Fu Li
Zhong Ji
Jungong Han
Errui Ding
32
11
0
08 Aug 2022
A Feature-space Multimodal Data Augmentation Technique for Text-video
  Retrieval
A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval
Alex Falcon
G. Serra
Oswald Lanz
VGen
44
25
0
03 Aug 2022
A Priority Map for Vision-and-Language Navigation with Trajectory Plans
  and Feature-Location Cues
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues
Jason Armitage
L. Impett
Rico Sennrich
36
5
0
24 Jul 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
39
114
0
16 Jul 2022
Contrastive Cross-Modal Knowledge Sharing Pre-training for
  Vision-Language Representation Learning and Retrieval
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval
Keyu Wen
Zhenshan Tan
Qingrong Cheng
Cheng Chen
X. Gu
VLM
32
0
0
02 Jul 2022
(Un)likelihood Training for Interpretable Embedding
(Un)likelihood Training for Interpretable Embedding
Jiaxin Wu
Chong-Wah Ngo
W. Chan
Zhijian Hou
17
2
0
01 Jul 2022
Exploiting Semantic Role Contextualized Video Features for
  Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance
  Retrieval Challenge 2022
Exploiting Semantic Role Contextualized Video Features for Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022
Burak Satar
Erik Cambria
Hanwang Zhang
J. Lim
32
3
0
29 Jun 2022
Semantic Role Aware Correlation Transformer for Text to Video Retrieval
Semantic Role Aware Correlation Transformer for Text to Video Retrieval
Burak Satar
Erik Cambria
Xavier Bresson
J. Lim
ViT
14
10
0
26 Jun 2022
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video
  Retrieval
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
Burak Satar
Erik Cambria
Hanwang Zhang
J. Lim
35
11
0
26 Jun 2022
UniUD-FBK-UB-UniBZ Submission to the EPIC-Kitchens-100 Multi-Instance
  Retrieval Challenge 2022
UniUD-FBK-UB-UniBZ Submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2022
Alex Falcon
G. Serra
Sergio Escalera
Oswald Lanz
27
1
0
22 Jun 2022
Learn to Understand Negation in Video Retrieval
Learn to Understand Negation in Video Retrieval
Ziyue Wang
Aozhu Chen
Fan Hu
Xirong Li
SSL
13
12
0
30 Apr 2022
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Alex Falcon
Swathikiran Sudhakaran
G. Serra
Sergio Escalera
Oswald Lanz
40
7
0
27 Apr 2022
A Multi-level Alignment Training Scheme for Video-and-Language Grounding
A Multi-level Alignment Training Scheme for Video-and-Language Grounding
Yubo Zhang
Feiyang Niu
Q. Ping
Govind Thattai
CVBM
59
2
0
22 Apr 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
  Cross-Modal Retrieval
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIP
VLM
27
55
0
15 Apr 2022
MHMS: Multimodal Hierarchical Multimedia Summarization
MHMS: Multimodal Hierarchical Multimedia Summarization
Jielin Qiu
Jiacheng Zhu
Mengdi Xu
Franck Dernoncourt
Trung Bui
Zhaowen Wang
Bo-wen Li
Ding Zhao
Hailin Jin
22
12
0
07 Apr 2022
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal
  Grounding
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding
Ziyue Wu
Junyu Gao
Shucheng Huang
Changsheng Xu
36
4
0
04 Apr 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
35
19
0
23 Mar 2022
Learning video retrieval models with relevance-aware online mining
Learning video retrieval models with relevance-aware online mining
Alex Falcon
G. Serra
Oswald Lanz
AI4TS
27
7
0
16 Mar 2022
Revitalize Region Feature for Democratizing Video-Language Pre-training
  of Retrieval
Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval
Guanyu Cai
Yixiao Ge
Binjie Zhang
Alex Jinpeng Wang
Rui Yan
...
Ying Shan
Lianghua He
Xiaohu Qie
Jianping Wu
Mike Zheng Shou
VLM
13
6
0
15 Mar 2022
Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval
Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval
Jinpeng Wang
Bin Chen
Dongliang Liao
Ziyun Zeng
Gongfu Li
Shutao Xia
Jin Xu
30
7
0
07 Feb 2022
Explore-And-Match: Bridging Proposal-Based and Proposal-Free With
  Transformer for Sentence Grounding in Videos
Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos
Sangmin Woo
Jinyoung Park
Inyong Koo
Sumin Lee
Minki Jeong
Changick Kim
46
3
0
25 Jan 2022
Reading-strategy Inspired Visual Representation Learning for
  Text-to-Video Retrieval
Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval
Jianfeng Dong
Yabing Wang
Xianke Chen
Xiaoye Qu
Xirong Li
Y. He
Xun Wang
14
58
0
23 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
29
108
0
13 Jan 2022
Cross Modal Retrieval with Querybank Normalisation
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
27
84
0
23 Dec 2021
Vision Transformer Based Video Hashing Retrieval for Tracing the Source
  of Fake Videos
Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos
Pengfei Pei
Xianfeng Zhao
Yun Cao
Jinchuan Li
Xiaowei Yi
ViT
24
8
0
15 Dec 2021
Video as Conditional Graph Hierarchy for Multi-Granular Question
  Answering
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
30
111
0
12 Dec 2021
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video
  Retrieval
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
Fan Hu
Aozhu Chen
Ziyu Wang
Fangming Zhou
Jianfeng Dong
Xirong Li
22
30
0
03 Dec 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video
  Question Answering
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang
Zi-yi Liu
N. Zheng
30
13
0
29 Nov 2021
CLIP2TV: Align, Match and Distill for Video-Text Retrieval
CLIP2TV: Align, Match and Distill for Video-Text Retrieval
Zijian Gao
Jiaheng Liu
Weiqi Sun
S. Chen
Dedan Chang
Lili Zhao
VLM
CLIP
31
17
0
10 Nov 2021
Hierarchical Deep Residual Reasoning for Temporal Moment Localization
Hierarchical Deep Residual Reasoning for Temporal Moment Localization
Ziyang Ma
Xianjing Han
Xuemeng Song
Yiran Cui
Liqiang Nie
18
9
0
31 Oct 2021
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video
  Retrieval
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval
Ning Han
Jingjing Chen
Chuhao Shi
Yawen Zeng
Guangyi Xiao
Hao Chen
22
10
0
29 Oct 2021
Domain Adaptation in Multi-View Embedding for Cross-Modal Video
  Retrieval
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval
Jonathan Munro
Michael Wray
Diane Larlus
G. Csurka
Dima Damen
31
6
0
25 Oct 2021
Video and Text Matching with Conditioned Embeddings
Video and Text Matching with Conditioned Embeddings
Ameen Ali
Idan Schwartz
Tamir Hazan
Lior Wolf
94
13
0
21 Oct 2021
A Feature Consistency Driven Attention Erasing Network for Fine-Grained
  Image Retrieval
A Feature Consistency Driven Attention Erasing Network for Fine-Grained Image Retrieval
Qi Zhao
Xu Wang
Shuchang Lyu
Binghao Liu
Yifan Yang
42
18
0
09 Oct 2021
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment
  Retrieval
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval
Zhijian Hou
Chong-Wah Ngo
W. Chan
19
38
0
21 Sep 2021
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual
  Softmax Loss
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
Xingyi Cheng
Hezheng Lin
Xiangyu Wu
Fan Yang
Dong Shen
14
149
0
09 Sep 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
27
137
0
23 Aug 2021
HANet: Hierarchical Alignment Networks for Video-Text Retrieval
HANet: Hierarchical Alignment Networks for Video-Text Retrieval
Peng Wu
Xiangteng He
Mingqian Tang
Yiliang Lv
Jing Liu
42
52
0
26 Jul 2021
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
Han Fang
Pengfei Xiong
Luhui Xu
Yu Chen
CLIP
VLM
35
292
0
21 Jun 2021
GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts
GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts
Lukas Stappen
Jason Thies
Gerhard Johann Hagerer
Björn W. Schuller
Georg Groh
31
3
0
04 May 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video
  Question Answering
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
167
100
0
29 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
170
170
0
20 Apr 2021
Temporal Query Networks for Fine-grained Video Understanding
Temporal Query Networks for Fine-grained Video Understanding
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
24
83
0
19 Apr 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
21
124
0
16 Apr 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual
  Reasoning
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen
Jiayuan Mao
Jiajun Wu
Kwan-Yee K. Wong
J. Tenenbaum
Chuang Gan
VGen
36
92
0
30 Mar 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
  Transformers
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
25
136
0
30 Mar 2021
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Rui Zhao
Kecheng Zheng
Zhengjun Zha
Hongtao Xie
Jiebo Luo
35
3
0
29 Mar 2021
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text
  Retrieval
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
Song Liu
Haoqi Fan
Shengsheng Qian
Yiru Chen
Wenkui Ding
Zhongyuan Wang
30
145
0
28 Mar 2021
Previous
1234
Next