ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.07284
  4. Cited By
Align and Attend: Multimodal Summarization with Dual Contrastive Losses

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

13 March 2023
Bo He
Jun Wang
Jielin Qiu
Trung Bui
Abhinav Shrivastava
Zhaowen Wang
ArXivPDFHTML

Papers citing "Align and Attend: Multimodal Summarization with Dual Contrastive Losses"

44 / 44 papers shown
Title
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
Weihan Xu
Yimeng Ma
Jingyue Huang
Yang Li
Wenye Ma
Taylor Berg-Kirkpatrick
Julian McAuley
Paul Pu Liang
Hao-Wen Dong
DiffM
VGen
148
0
0
24 May 2025
SD-VSum: A Method and Dataset for Script-Driven Video Summarization
SD-VSum: A Method and Dataset for Script-Driven Video Summarization
Manolis Mylonas
Evlampios Apostolidis
Vasileios Mezaris
58
0
0
06 May 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
108
7
0
29 Dec 2024
Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection
Efficient Fourier Filtering Network with Contrastive Learning for UAV-based Unaligned Bi-modal Salient Object Detection
Pengfei Lyu
Pak-Hei Yeung
Xiufei Cheng
Xiaosheng Yu
Chengdong Wu
Jagath C. Rajapakse
64
0
0
06 Nov 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
80
2
0
12 Sep 2024
Semantics-Consistent Cross-domain Summarization via Optimal Transport
  Alignment
Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment
Jielin Qiu
Jiacheng Zhu
Mengdi Xu
Franck Dernoncourt
Trung Bui
Zhaowen Wang
Yue Liu
Ding Zhao
Hailin Jin
56
11
0
10 Oct 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and
  Vision-Language Tasks
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
133
640
0
22 Aug 2022
MHMS: Multimodal Hierarchical Multimedia Summarization
MHMS: Multimodal Hierarchical Multimedia Summarization
Jielin Qiu
Jiacheng Zhu
Mengdi Xu
Franck Dernoncourt
Trung Bui
Zhaowen Wang
Yue Liu
Ding Zhao
Hailin Jin
66
12
0
07 Apr 2022
VLMo: Unified Vision-Language Pre-Training with
  Mixture-of-Modality-Experts
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLM
MLLM
MoE
65
552
0
03 Nov 2021
StreamHover: Livestream Transcript Summarization and Annotation
StreamHover: Livestream Transcript Summarization and Annotation
Sangwoo Cho
Franck Dernoncourt
Timothy Jeewun Ganter
Trung Bui
Nedim Lipka
Walter Chang
Hailin Jin
Jonathan Brandt
H. Foroosh
Fei Liu
3DGS
AI4TS
53
29
0
11 Sep 2021
CLIP-It! Language-Guided Video Summarization
CLIP-It! Language-Guided Video Summarization
Medhini Narasimhan
Anna Rohrbach
Trevor Darrell
CLIP
61
117
0
01 Jul 2021
Reconstructive Sequence-Graph Network for Video Summarization
Reconstructive Sequence-Graph Network for Video Summarization
Bin Zhao
Haopeng Li
Xiaoqiang Lu
Xuelong Li
48
102
0
10 May 2021
An Empirical Study of Training Self-Supervised Vision Transformers
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
146
1,857
0
05 Apr 2021
MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical
  Attention
MAST: Multimodal Abstractive Summarization with Trimodal Hierarchical Attention
Aman Khullar
Udit Arora
40
43
0
15 Oct 2020
VMSMO: Learning to Generate Multimodal Summary for Video-based News
  Articles
VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles
Li Mingzhe
Preslav Nakov
Shen Gao
Zhangming Chan
Dongyan Zhao
Rui Yan
83
83
0
12 Oct 2020
Hard Negative Mixing for Contrastive Learning
Hard Negative Mixing for Contrastive Learning
Yannis Kalantidis
Mert Bulent Sariyildiz
Noé Pion
Philippe Weinzaepfel
Diane Larlus
SSL
120
643
0
02 Oct 2020
Multi-modal Summarization for Video-containing Documents
Multi-modal Summarization for Video-containing Documents
Xiyan Fu
Jun Wang
Zhenglu Yang
45
23
0
17 Sep 2020
SumGraph: Video Summarization via Recursive Graph Modeling
SumGraph: Video Summarization via Recursive Graph Modeling
Jungin Park
Jiyoung Lee
Ig-Jae Kim
Kwanghoon Sohn
54
54
0
17 Jul 2020
Extractive Summarization as Text Matching
Extractive Summarization as Text Matching
Ming Zhong
Pengfei Liu
Yiran Chen
Danqing Wang
Xipeng Qiu
Xuanjing Huang
127
462
0
19 Apr 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
131
4,048
0
10 Apr 2020
On Compositions of Transformations in Contrastive Self-Supervised
  Learning
On Compositions of Transformations in Contrastive Self-Supervised Learning
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
46
49
0
09 Mar 2020
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive
  Summarization
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Jingqing Zhang
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM
3DGS
247
2,044
0
18 Dec 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal
  Transformers for TextVQA
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
67
197
0
14 Nov 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
169
12,065
0
13 Nov 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
337
937
0
24 Sep 2019
Adaptively Sparse Transformers
Adaptively Sparse Transformers
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
80
255
0
30 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
227
2,474
0
20 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
217
3,667
0
06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
522
24,351
0
26 Jul 2019
Rethinking the Evaluation of Video Summaries
Rethinking the Evaluation of Video Summaries
Mayu Otani
Yuta Nakashima
Esa Rahtu
J. Heikkilä
70
130
0
27 Mar 2019
Summarizing Videos with Attention
Summarizing Videos with Attention
Jiri Fajtl
Hajar Sadeghi Sokeh
Vasileios Argyriou
D. Monekosso
Paolo Remagnino
43
188
0
05 Dec 2018
Discriminative Feature Learning for Unsupervised Video Summarization
Discriminative Feature Learning for Unsupervised Video Summarization
Yunjae Jung
Donghyeon Cho
Dahun Kim
Sanghyun Woo
In So Kweon
36
131
0
24 Nov 2018
How2: A Large-scale Dataset for Multimodal Language Understanding
How2: A Large-scale Dataset for Multimodal Language Understanding
Ramon Sanabria
Ozan Caglayan
Shruti Palaskar
Desmond Elliott
Loïc Barrault
Lucia Specia
Florian Metze
VGen
MLLM
81
288
0
01 Nov 2018
Non-local Neural Networks
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
267
8,888
0
21 Nov 2017
Video Summarization with Attention-Based Encoder-Decoder Networks
Video Summarization with Attention-Based Encoder-Decoder Networks
Zhong Ji
Kailin Xiong
Yanwei Pang
Xuelong Li
36
306
0
31 Aug 2017
Tensor Fusion Network for Multimodal Sentiment Analysis
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Min Zhang
Louis-Philippe Morency
65
1,231
0
23 Jul 2017
A Deep Reinforced Model for Abstractive Summarization
A Deep Reinforced Model for Abstractive Summarization
Romain Paulus
Caiming Xiong
R. Socher
AI4TS
183
1,556
0
11 May 2017
TALL: Temporal Activity Localization via Language Query
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
120
819
0
05 May 2017
Get To The Point: Summarization with Pointer-Generator Networks
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
257
4,014
0
14 Apr 2017
SummaRuNNer: A Recurrent Neural Network based Sequence Model for
  Extractive Summarization of Documents
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents
Ramesh Nallapati
Feifei Zhai
Bowen Zhou
315
1,258
0
14 Nov 2016
Adversarial Feature Learning
Adversarial Feature Learning
Jiasen Lu
Philipp Krahenbuhl
Trevor Darrell
GAN
107
3
0
31 May 2016
Video Summarization with Long Short-term Memory
Video Summarization with Long Short-term Memory
Ke Zhang
Wei-Lun Chao
Fei Sha
Kristen Grauman
68
685
0
26 May 2016
Neural Summarization by Extracting Sentences and Words
Neural Summarization by Extracting Sentences and Words
Jianpeng Cheng
Mirella Lapata
67
806
0
23 Mar 2016
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
407
43,589
0
17 Sep 2014
1