ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.05402
  4. Cited By
MART: Memory-Augmented Recurrent Transformer for Coherent Video
  Paragraph Captioning

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

11 May 2020
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Joey Tianyi Zhou
ArXivPDFHTML

Papers citing "MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning"

36 / 86 papers shown
Title
Recurrent Memory Transformer
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Andrey Kravchenko
CLL
13
102
0
14 Jul 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video
  Paragraph Captioning
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
VLM
CoGe
11
25
0
26 Jun 2022
Recurrent Video Restoration Transformer with Guided Deformable Attention
Recurrent Video Restoration Transformer with Guided Deformable Attention
Christos Sakaridis
Yuchen Fan
Xiaoyu Xiang
Rakesh Ranjan
Eddy Ilg
Simon Green
Jingyun Liang
Kaicheng Zhang
Radu Timofte
Luc Van Gool
42
152
0
05 Jun 2022
Attract me to Buy: Advertisement Copywriting Generation with Multimodal
  Multi-structured Information
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information
Zhipeng Zhang
Xinglin Hou
K. Niu
Zhongzhen Huang
T. Ge
Yuning Jiang
Qi Wu
Peifeng Wang
31
4
0
07 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with
  Weak Supervision
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
31
45
0
04 May 2022
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Alex Falcon
Swathikiran Sudhakaran
G. Serra
Sergio Escalera
Oswald Lanz
40
7
0
27 Apr 2022
CREATE: A Benchmark for Chinese Short Video Retrieval and Title
  Generation
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Ziqi Zhang
Yuxin Chen
Zongyang Ma
Zhongang Qi
Chunfen Yuan
Bing Li
Ying Shan
Weiming Hu
VGen
29
8
0
31 Mar 2022
Visual Abductive Reasoning
Visual Abductive Reasoning
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
26
38
0
26 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video
  Segmentation
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
29
74
0
18 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
24
4
0
12 Mar 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
27
164
0
20 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
40
4
0
19 Nov 2021
Integrating Visuospatial, Linguistic and Commonsense Structure into
  Story Visualization
Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization
A. Maharana
Joey Tianyi Zhou
27
57
0
21 Oct 2021
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video
  Representations
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Mohammadreza Zolfaghari
Yi Zhu
Peter V. Gehler
Thomas Brox
135
127
0
30 Sep 2021
Space Time Recurrent Memory Network
Space Time Recurrent Memory Network
Hung-Cuong Nguyen
Chanho Kim
Fuxin Li
28
3
0
14 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal
  Attention
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
A Neural Conversation Generation Model via Equivalent Shared Memory
  Investigation
A Neural Conversation Generation Model via Equivalent Shared Memory Investigation
Changzhen Ji
Yating Zhang
Xiaozhong Liu
Adam Jatowt
Changlong Sun
Conghui Zhu
T. Zhao
26
1
0
20 Aug 2021
End-to-End Dense Video Captioning with Parallel Decoding
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
47
179
0
17 Aug 2021
Neural Variational Learning for Grounded Language Acquisition
Neural Variational Learning for Grounded Language Acquisition
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLM
SSL
GAN
DRL
21
2
0
20 Jul 2021
CLIP-It! Language-Guided Video Summarization
CLIP-It! Language-Guided Video Summarization
Medhini Narasimhan
Anna Rohrbach
Trevor Darrell
CLIP
17
113
0
01 Jul 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
21
37
0
30 May 2021
Improving Generation and Evaluation of Visual Stories via Semantic
  Consistency
Improving Generation and Evaluation of Visual Stories via Semantic Consistency
A. Maharana
Darryl Hannan
Joey Tianyi Zhou
EGVM
24
61
0
20 May 2021
Video-aided Unsupervised Grammar Induction
Video-aided Unsupervised Grammar Induction
Songyang Zhang
Linfeng Song
Lifeng Jin
Kun Xu
Dong Yu
Jiebo Luo
22
26
0
09 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
  Representation Learning
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
17
7
0
01 Apr 2021
Learning Domain Adaptation with Model Calibration for Surgical Report
  Generation in Robotic Surgery
Learning Domain Adaptation with Model Calibration for Surgical Report Generation in Robotic Surgery
Mengya Xu
Mobarakol Islam
C. Lim
Hongliang Ren
OOD
MedIm
37
29
0
31 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
13
17
0
27 Mar 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
43
647
0
11 Feb 2021
Understanding Action Sequences based on Video Captioning for
  Learning-from-Observation
Understanding Action Sequences based on Video Captioning for Learning-from-Observation
Iori Yanokura
Naoki Wake
Kazuhiro Sasabuchi
Katsushi Ikeuchi
Masayuki Inaba
24
4
0
09 Dec 2020
Neuro-Symbolic Representations for Video Captioning: A Case for
  Leveraging Inductive Biases for Vision and Language
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
32
3
0
18 Nov 2020
Multimodal Pretraining for Dense Video Captioning
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
21
81
0
10 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation
  Learning
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
20
168
0
01 Nov 2020
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
23
6
0
19 Oct 2020
What is More Likely to Happen Next? Video-and-Language Future Event
  Prediction
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
30
72
0
15 Oct 2020
Consistent Multiple Sequence Decoding
Consistent Multiple Sequence Decoding
Bicheng Xu
Leonid Sigal
31
0
0
02 Apr 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
119
275
0
24 Jan 2020
Previous
12