Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.06085
Cited By
On Pursuit of Designing Multi-modal Transformer for Video Grounding
13 September 2021
Meng Cao
Long Chen
Mike Zheng Shou
Can Zhang
Yuexian Zou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Pursuit of Designing Multi-modal Transformer for Video Grounding"
37 / 37 papers shown
Title
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
153
4
0
24 Feb 2025
Natural Language Video Localization with Learnable Moment Proposals
Shaoning Xiao
Long Chen
Jian Shao
Yueting Zhuang
Jun Xiao
58
43
0
22 Sep 2021
Video Relation Detection via Tracklet based Visual Transformer
Kaifeng Gao
Long Chen
Yifeng Huang
Jun Xiao
ViT
61
29
0
19 Aug 2021
Transformer Tracking
Xin Chen
Bin Yan
Jiawen Zhu
Dong Wang
Xiaoyun Yang
Huchuan Lu
ViT
64
957
0
29 Mar 2021
Context-aware Biaffine Localizing Network for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Jianfeng Dong
Pan Zhou
Yu Cheng
Wei Wei
Zichuan Xu
Yulai Xie
55
145
0
22 Mar 2021
Boundary Proposal Network for Two-Stage Natural Language Video Localization
Shaoning Xiao
Long Chen
Songyang Zhang
Wei Ji
Jian Shao
Lu Ye
Jun Xiao
47
160
0
15 Mar 2021
TransReID: Transformer-based Object Re-Identification
Shuting He
Haowen Luo
Pichao Wang
F. Wang
Hao Li
Wei Jiang
ViT
263
816
0
08 Feb 2021
TrackFormer: Multi-Object Tracking with Transformers
Tim Meinhardt
A. Kirillov
Laura Leal-Taixe
Christoph Feichtenhofer
VOT
266
766
0
07 Jan 2021
TransTrack: Multiple Object Tracking with Transformer
Pei Sun
Jinkun Cao
Yi Jiang
Rufeng Zhang
Enze Xie
Zehuan Yuan
Changhu Wang
Ping Luo
ViT
VOT
306
580
0
31 Dec 2020
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
377
6,762
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
632
41,003
0
22 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
216
5,073
0
08 Oct 2020
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization
Daizong Liu
Xiaoye Qu
Xiao-Yang Liu
Jianfeng Dong
Pan Zhou
Zichuan Xu
58
129
0
04 Aug 2020
Learning Texture Transformer Network for Image Super-Resolution
Fuzhi Yang
Huan Yang
Jianlong Fu
Hongtao Lu
B. Guo
SupR
ViT
74
722
0
07 Jun 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
382
13,025
0
26 May 2020
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
81
315
0
29 Apr 2020
Local-Global Video-Text Interactions for Temporal Grounding
Jonghwan Mun
Minsu Cho
Bohyung Han
64
269
0
16 Apr 2020
Dense Regression Network for Video Grounding
Runhao Zeng
Haoming Xu
Wenbing Huang
Peihao Chen
Mingkui Tan
Chuang Gan
68
283
0
07 Apr 2020
Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Songyang Zhang
Houwen Peng
Jianlong Fu
Jiebo Luo
69
470
0
08 Dec 2019
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
Yitian Yuan
Lin Ma
Jingwen Wang
Wei Liu
Wenwu Zhu
74
244
0
31 Oct 2019
Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction
Jingwen Wang
Lin Ma
Wenhao Jiang
70
182
0
11 Sep 2019
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
F. Saleh
Hongdong Li
Stephen Gould
59
147
0
20 Aug 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Zhenfang Chen
Lin Ma
Wenhan Luo
Kwan-Yee K. Wong
81
103
0
06 Jun 2019
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos
Zhu Zhang
Zhijie Lin
Zhou Zhao
Zhenxin Xiao
45
213
0
06 Jun 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
147
4,163
0
25 Feb 2019
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
Dongliang He
Xiang Zhao
Jizhou Huang
Fu Li
Xiao-Chang Liu
Shilei Wen
63
153
0
21 Jan 2019
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment
Da Zhang
Xiyang Dai
Xin Eric Wang
Yuan-fang Wang
L. Davis
61
305
0
30 Nov 2018
MAC: Mining Activity Concepts for Language-based Temporal Localization
Runzhou Ge
J. Gao
Kan Chen
Ram Nevatia
67
179
0
21 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.7K
94,770
0
11 Oct 2018
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression
Yitian Yuan
Tao Mei
Wenwu Zhu
73
333
0
19 Apr 2018
Multilevel Language and Vision Integration for Text-to-Clip Retrieval
Huijuan Xu
Kun He
Bryan A. Plummer
Leonid Sigal
Stan Sclaroff
Kate Saenko
CLIP
61
323
0
13 Apr 2018
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
110
946
0
04 Aug 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
687
131,526
0
12 Jun 2017
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
123
819
0
05 May 2017
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
134
1,242
0
02 May 2017
Temporal Action Detection with Structured Segment Networks
Yue Zhao
Yuanjun Xiong
Limin Wang
Zhirong Wu
Xiaoou Tang
Dahua Lin
67
915
0
20 Apr 2017
End-to-end people detection in crowded scenes
Russell Stewart
Mykhaylo Andriluka
73
544
0
16 Jun 2015
1