ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.10699
  4. Cited By
MDMMT: Multidomain Multimodal Transformer for Video Retrieval

MDMMT: Multidomain Multimodal Transformer for Video Retrieval

19 March 2021
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
ArXivPDFHTML

Papers citing "MDMMT: Multidomain Multimodal Transformer for Video Retrieval"

28 / 78 papers shown
Title
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
31
149
0
28 Mar 2022
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional
  Videos
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
Muheng Li
Lei Chen
Yueqi Duan
Zhilan Hu
Jianjiang Feng
Jie Zhou
Jiwen Lu
19
75
0
26 Mar 2022
Learning video retrieval models with relevance-aware online mining
Learning video retrieval models with relevance-aware online mining
Alex Falcon
G. Serra
O. Lanz
AI4TS
19
7
0
16 Mar 2022
Integrating Language Guidance into Vision-based Deep Metric Learning
Integrating Language Guidance into Vision-based Deep Metric Learning
Karsten Roth
Oriol Vinyals
Zeynep Akata
VLM
14
29
0
16 Mar 2022
Revitalize Region Feature for Democratizing Video-Language Pre-training
  of Retrieval
Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval
Guanyu Cai
Yixiao Ge
Binjie Zhang
Alex Jinpeng Wang
Rui Yan
...
Ying Shan
Lianghua He
Xiaohu Qie
Jianping Wu
Mike Zheng Shou
VLM
13
6
0
15 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
  More Step Towards Generalization
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
28
16
0
14 Mar 2022
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
150
361
0
24 Jan 2022
ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via
  Exploiting CLIP Cues
ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues
Hengcan Shi
Munawar Hayat
Yicheng Wu
Jianfei Cai
VLM
28
60
0
18 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Multi-Query Video Retrieval
Multi-Query Video Retrieval
Zeyu Wang
Yu Wu
Karthik Narasimhan
Olga Russakovsky
36
17
0
10 Jan 2022
Prompting Visual-Language Models for Efficient Video Understanding
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya-Qin Zhang
Weidi Xie
VPVLM
VLM
22
363
0
08 Dec 2021
Many Heads but One Brain: Fusion Brain -- a Competition and a Single
  Multimodal Multitask Architecture
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture
Daria Bakshandaeva
Denis Dimitrov
V.Ya. Arkhipkin
Alex Shonenkov
M. Potanin
...
Mikhail Martynov
Anton Voronov
Vera Davydova
E. Tutubalina
Aleksandr Petiushko
33
0
0
22 Nov 2021
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for
  Instructional Video Retrieval
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval
Yue Yang
Joongwon Kim
Artemis Panagopoulou
Mark Yatskar
Chris Callison-Burch
LM&Ro
21
14
0
17 Nov 2021
CLIP2TV: Align, Match and Distill for Video-Text Retrieval
CLIP2TV: Align, Match and Distill for Video-Text Retrieval
Zijian Gao
J. Liu
Weiqi Sun
S. Chen
Dedan Chang
Lili Zhao
VLM
CLIP
23
17
0
10 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
11
29
0
01 Nov 2021
Cross-Modality Fusion Transformer for Multispectral Object Detection
Cross-Modality Fusion Transformer for Multispectral Object Detection
Q. Fang
D. Han
Zhaokui Wang
ViT
22
140
0
30 Oct 2021
ViSeRet: A simple yet effective approach to moment retrieval via
  fine-grained video segmentation
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation
Aiden Seung Joon Lee
Hanseok Oh
Minjoon Seo
11
1
0
11 Oct 2021
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual
  Softmax Loss
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
Xingyi Cheng
Hezheng Lin
Xiangyu Wu
Fan Yang
Dong Shen
6
148
0
09 Sep 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
19
231
0
21 Jul 2021
AudioCLIP: Extending CLIP to Image, Text and Audio
AudioCLIP: Extending CLIP to Image, Text and Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
CLIP
VLM
25
359
0
24 Jun 2021
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
Han Fang
Pengfei Xiong
Luhui Xu
Yu Chen
CLIP
VLM
13
291
0
21 Jun 2021
Cross-Modal Discrete Representation Learning
Cross-Modal Discrete Representation Learning
Alexander H. Liu
SouYoung Jin
Cheng-I Jeff Lai
Andrew Rouditchenko
A. Oliva
James R. Glass
SSL
22
40
0
10 Jun 2021
Connecting Language and Vision for Natural Language-Based Vehicle
  Retrieval
Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
Shuai Bai
Zhedong Zheng
Xiaohan Wang
Junyang Lin
Zhu Zhang
Chang Zhou
Yi Yang
Hongxia Yang
21
27
0
31 May 2021
AudioVisual Video Summarization
AudioVisual Video Summarization
Bin Zhao
Maoguo Gong
Xuelong Li
11
24
0
17 May 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
317
780
0
18 Apr 2021
A Straightforward Framework For Video Retrieval Using CLIP
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
318
117
0
24 Feb 2021
Graph Based Temporal Aggregation for Video Retrieval
Graph Based Temporal Aggregation for Video Retrieval
Arvind Srinivasan
Aprameya Bharadwaj
Aveek Saha
Natarajan Subramanyam
19
0
0
04 Nov 2020
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
418
596
0
21 Jul 2020
Previous
12