Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.05402
Cited By
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
11 May 2020
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Joey Tianyi Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning"
36 / 86 papers shown
Title
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Andrey Kravchenko
CLL
13
102
0
14 Jul 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
VLM
CoGe
11
25
0
26 Jun 2022
Recurrent Video Restoration Transformer with Guided Deformable Attention
Christos Sakaridis
Yuchen Fan
Xiaoyu Xiang
Rakesh Ranjan
Eddy Ilg
Simon Green
Jingyun Liang
Kaicheng Zhang
Radu Timofte
Luc Van Gool
42
152
0
05 Jun 2022
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information
Zhipeng Zhang
Xinglin Hou
K. Niu
Zhongzhen Huang
T. Ge
Yuning Jiang
Qi Wu
Peifeng Wang
31
4
0
07 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
31
45
0
04 May 2022
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Alex Falcon
Swathikiran Sudhakaran
G. Serra
Sergio Escalera
Oswald Lanz
40
7
0
27 Apr 2022
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Ziqi Zhang
Yuxin Chen
Zongyang Ma
Zhongang Qi
Chunfen Yuan
Bing Li
Ying Shan
Weiming Hu
VGen
29
8
0
31 Mar 2022
Visual Abductive Reasoning
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
26
38
0
26 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
29
74
0
18 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
24
4
0
12 Mar 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
27
164
0
20 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
40
4
0
19 Nov 2021
Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization
A. Maharana
Joey Tianyi Zhou
27
57
0
21 Oct 2021
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Mohammadreza Zolfaghari
Yi Zhu
Peter V. Gehler
Thomas Brox
135
127
0
30 Sep 2021
Space Time Recurrent Memory Network
Hung-Cuong Nguyen
Chanho Kim
Fuxin Li
28
3
0
14 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
A Neural Conversation Generation Model via Equivalent Shared Memory Investigation
Changzhen Ji
Yating Zhang
Xiaozhong Liu
Adam Jatowt
Changlong Sun
Conghui Zhu
T. Zhao
26
1
0
20 Aug 2021
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
47
179
0
17 Aug 2021
Neural Variational Learning for Grounded Language Acquisition
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLM
SSL
GAN
DRL
21
2
0
20 Jul 2021
CLIP-It! Language-Guided Video Summarization
Medhini Narasimhan
Anna Rohrbach
Trevor Darrell
CLIP
17
113
0
01 Jul 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
21
37
0
30 May 2021
Improving Generation and Evaluation of Visual Stories via Semantic Consistency
A. Maharana
Darryl Hannan
Joey Tianyi Zhou
EGVM
24
61
0
20 May 2021
Video-aided Unsupervised Grammar Induction
Songyang Zhang
Linfeng Song
Lifeng Jin
Kun Xu
Dong Yu
Jiebo Luo
22
26
0
09 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
17
7
0
01 Apr 2021
Learning Domain Adaptation with Model Calibration for Surgical Report Generation in Robotic Surgery
Mengya Xu
Mobarakol Islam
C. Lim
Hongliang Ren
OOD
MedIm
37
29
0
31 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
13
17
0
27 Mar 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
43
647
0
11 Feb 2021
Understanding Action Sequences based on Video Captioning for Learning-from-Observation
Iori Yanokura
Naoki Wake
Kazuhiro Sasabuchi
Katsushi Ikeuchi
Masayuki Inaba
24
4
0
09 Dec 2020
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
32
3
0
18 Nov 2020
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
21
81
0
10 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
20
168
0
01 Nov 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
23
6
0
19 Oct 2020
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
30
72
0
15 Oct 2020
Consistent Multiple Sequence Decoding
Bicheng Xu
Leonid Sigal
31
0
0
02 Apr 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
119
275
0
24 Jan 2020
Previous
1
2