Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.07284
Cited By
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
13 March 2023
Bo He
Jun Wang
Jielin Qiu
Trung Bui
Abhinav Shrivastava
Zhaowen Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Align and Attend: Multimodal Summarization with Dual Contrastive Losses"
44 / 44 papers shown
Title
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
Weihan Xu
Yimeng Ma
Jingyue Huang
Yang Li
Wenye Ma
Taylor Berg-Kirkpatrick
Julian McAuley
Paul Pu Liang
Hao-Wen Dong
DiffM
VGen
148
0
0
24 May 2025
SD-VSum: A Method and Dataset for Script-Driven Video Summarization
Manolis Mylonas
Evlampios Apostolidis
Vasileios Mezaris
58
0
0
06 May 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
108
7
0
29 Dec 2024
Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection
Pengfei Lyu
Pak-Hei Yeung
Xiufei Cheng
Xiaosheng Yu
Chengdong Wu
Jagath C. Rajapakse
64
0
0
06 Nov 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
80
2
0
12 Sep 2024
Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment
Jielin Qiu
Jiacheng Zhu
Mengdi Xu
Franck Dernoncourt
Trung Bui
Zhaowen Wang
Yue Liu
Ding Zhao
Hailin Jin
56
11
0
10 Oct 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
133
640
0
22 Aug 2022
MHMS: Multimodal Hierarchical Multimedia Summarization
Jielin Qiu
Jiacheng Zhu
Mengdi Xu
Franck Dernoncourt
Trung Bui
Zhaowen Wang
Yue Liu
Ding Zhao
Hailin Jin
66
12
0
07 Apr 2022
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLM
MLLM
MoE
65
552
0
03 Nov 2021
StreamHover: Livestream Transcript Summarization and Annotation
Sangwoo Cho
Franck Dernoncourt
Timothy Jeewun Ganter
Trung Bui
Nedim Lipka
Walter Chang
Hailin Jin
Jonathan Brandt
H. Foroosh
Fei Liu
3DGS
AI4TS
53
29
0
11 Sep 2021
CLIP-It! Language-Guided Video Summarization
Medhini Narasimhan
Anna Rohrbach
Trevor Darrell
CLIP
61
117
0
01 Jul 2021
Reconstructive Sequence-Graph Network for Video Summarization
Bin Zhao
Haopeng Li
Xiaoqiang Lu
Xuelong Li
48
102
0
10 May 2021
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
146
1,857
0
05 Apr 2021
MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention
Aman Khullar
Udit Arora
40
43
0
15 Oct 2020
VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles
Li Mingzhe
Preslav Nakov
Shen Gao
Zhangming Chan
Dongyan Zhao
Rui Yan
83
83
0
12 Oct 2020
Hard Negative Mixing for Contrastive Learning
Yannis Kalantidis
Mert Bulent Sariyildiz
Noé Pion
Philippe Weinzaepfel
Diane Larlus
SSL
120
643
0
02 Oct 2020
Multi-modal Summarization for Video-containing Documents
Xiyan Fu
Jun Wang
Zhenglu Yang
45
23
0
17 Sep 2020
SumGraph: Video Summarization via Recursive Graph Modeling
Jungin Park
Jiyoung Lee
Ig-Jae Kim
Kwanghoon Sohn
54
54
0
17 Jul 2020
Extractive Summarization as Text Matching
Ming Zhong
Pengfei Liu
Yiran Chen
Danqing Wang
Xipeng Qiu
Xuanjing Huang
127
462
0
19 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
131
4,048
0
10 Apr 2020
On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
46
49
0
09 Mar 2020
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Jingqing Zhang
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM
3DGS
247
2,044
0
18 Dec 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
67
197
0
14 Nov 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
169
12,065
0
13 Nov 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
337
937
0
24 Sep 2019
Adaptively Sparse Transformers
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
80
255
0
30 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
227
2,474
0
20 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
217
3,667
0
06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
522
24,351
0
26 Jul 2019
Rethinking the Evaluation of Video Summaries
Mayu Otani
Yuta Nakashima
Esa Rahtu
J. Heikkilä
70
130
0
27 Mar 2019
Summarizing Videos with Attention
Jiri Fajtl
Hajar Sadeghi Sokeh
Vasileios Argyriou
D. Monekosso
Paolo Remagnino
43
188
0
05 Dec 2018
Discriminative Feature Learning for Unsupervised Video Summarization
Yunjae Jung
Donghyeon Cho
Dahun Kim
Sanghyun Woo
In So Kweon
36
131
0
24 Nov 2018
How2: A Large-scale Dataset for Multimodal Language Understanding
Ramon Sanabria
Ozan Caglayan
Shruti Palaskar
Desmond Elliott
Loïc Barrault
Lucia Specia
Florian Metze
VGen
MLLM
81
288
0
01 Nov 2018
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
267
8,888
0
21 Nov 2017
Video Summarization with Attention-Based Encoder-Decoder Networks
Zhong Ji
Kailin Xiong
Yanwei Pang
Xuelong Li
36
306
0
31 Aug 2017
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Min Zhang
Louis-Philippe Morency
65
1,231
0
23 Jul 2017
A Deep Reinforced Model for Abstractive Summarization
Romain Paulus
Caiming Xiong
R. Socher
AI4TS
183
1,556
0
11 May 2017
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
120
819
0
05 May 2017
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
257
4,014
0
14 Apr 2017
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents
Ramesh Nallapati
Feifei Zhai
Bowen Zhou
315
1,258
0
14 Nov 2016
Adversarial Feature Learning
Jiasen Lu
Philipp Krahenbuhl
Trevor Darrell
GAN
107
3
0
31 May 2016
Video Summarization with Long Short-term Memory
Ke Zhang
Wei-Lun Chao
Fei Sha
Kristen Grauman
68
685
0
26 May 2016
Neural Summarization by Extracting Sentences and Words
Jianpeng Cheng
Mirella Lapata
67
806
0
23 Mar 2016
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
407
43,589
0
17 Sep 2014
1