Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00650
Cited By
CLIP-It! Language-Guided Video Summarization
1 July 2021
Medhini Narasimhan
Anna Rohrbach
Trevor Darrell
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLIP-It! Language-Guided Video Summarization"
50 / 68 papers shown
Title
SD-VSum: A Method and Dataset for Script-Driven Video Summarization
Manolis Mylonas
Evlampios Apostolidis
Vasileios Mezaris
35
0
0
06 May 2025
Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization
Anas Anwarul Haq Khan
Utkarsh Verma
Prateek Chanda
Ganesh Ramakrishnan
VLM
53
0
0
30 Apr 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
92
0
0
25 Apr 2025
Video Summarization with Large Language Models
Min Jung Lee
Dayoung Gong
Minsu Cho
31
0
0
15 Apr 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Yogesh S Rawat
VLM
160
1
0
11 Mar 2025
CFSum: A Transformer-Based Multi-Modal Video Summarization Framework With Coarse-Fine Fusion
Yaowei Guo
Jiazheng Xing
Xiaojun Hou
Shuo Xin
Juntao Jiang
Demetri Terzopoulos
Chenfanfu Jiang
Yong Liu
ViT
38
0
0
01 Mar 2025
V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation
P. Guhan
Tsung-Wei Huang
Guan-Ming Su
Subhadra Gopalakrishnan
Dinesh Manocha
VGen
63
0
0
14 Jan 2025
Personalized Video Summarization by Multimodal Video Understanding
Brian Chen
Xiangyuan Zhao
Yingnan Zhu
41
1
0
05 Nov 2024
TeaserGen: Generating Teasers for Long Documentaries
Weihan Xu
Paul Pu Liang
Haven Kim
Julian McAuley
Taylor Berg-Kirkpatrick
Hao-Wen Dong
VGen
VLM
DiffM
31
0
0
08 Oct 2024
Optimising for the Unknown: Domain Alignment for Cephalometric Landmark Detection
Julian Wyatt
Irina Voiculescu
23
1
0
06 Oct 2024
Does SpatioTemporal information benefit Two video summarization benchmarks?
Aashutosh Ganesh
Mirela Popa
Daan Odijk
Nava Tintarev
AI4TS
27
0
0
04 Oct 2024
Contrastive Abstraction for Reinforcement Learning
Vihang Patil
M. Hofmarcher
Elisabeth Rumetshofer
Sepp Hochreiter
OffRL
SSL
24
2
0
01 Oct 2024
AyE-Edge: Automated Deployment Space Search Empowering Accuracy yet Efficient Real-Time Object Detection on the Edge
Chao Wu
Yifan Gong
Liangkai Liu
Mengquan Li
Yushu Wu
Xuan Shen
Zhimin Li
Geng Yuan
Weisong Shi
Yanzhi Wang
23
1
0
25 Jul 2024
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
Jaewon Son
Jaehun Park
Kwangsu Kim
AI4TS
ViT
39
8
0
20 May 2024
"Previously on ..." From Recaps to Story Summarization
Aditya Kumar Singh
Dhruv Srivastava
Makarand Tapaswi
50
0
0
19 May 2024
Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video
Tomoya Sugihara
Shuntaro Masuda
Ling Xiao
Toshihiko Yamasaki
46
3
0
14 May 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
68
25
0
18 Apr 2024
VideoSAGE: Video Summarization with Graph Representation Learning
Jose M. Rojas Chaves
Subarna Tripathi
36
3
0
14 Apr 2024
Towards Automated Movie Trailer Generation
Dawit Mureja Argaw
Mattia Soldan
Alejandro Pardo
Chen Zhao
Fabian Caba Heilbron
Joon Son Chung
Guohao Li
ViT
40
4
0
04 Apr 2024
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Mureja Argaw
Seunghyun Yoon
Fabian Caba Heilbron
Hanieh Deilamsalehy
Trung Bui
Zhaowen Wang
Franck Dernoncourt
Joon Son Chung
43
9
0
04 Apr 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh
Ishaan Shrivastava
Mayank Vatsa
Richa Singh
Aparna Bharati
VLM
CoGe
34
14
0
29 Mar 2024
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan
Jian-Huang Lai
Wei-Shi Zheng
Jianfang Hu
AI4TS
44
5
0
18 Mar 2024
TutoAI: A Cross-domain Framework for AI-assisted Mixed-media Tutorial Creation on Physical Tasks
Yuexi Chen
Vlad I. Morariu
Anh Truong
Zhicheng Liu
DiffM
VGen
45
4
0
12 Mar 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
42
6
0
08 Jan 2024
Generating Illustrated Instructions
Sachit Menon
Ishan Misra
Rohit Girdhar
DiffM
34
4
0
07 Dec 2023
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
Keito Kudo
Haruki Nagasawa
Jun Suzuki
Nobuyuki Shimizu
42
2
0
04 Dec 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
33
6
0
27 Nov 2023
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
Jaeyong Kang
Soujanya Poria
Dorien Herremans
MGen
VGen
17
32
0
02 Nov 2023
Tell Me What Is Good About This Property: Leveraging Reviews For Segment-Personalized Image Collection Summarization
Monika Wysoczanska
Moran Beladev
Karen Lastmann Assaraf
Fengjun Wang
Ofri Kleinfeld
Gil Amsalem
Hadas Harush Boker
30
2
0
30 Oct 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
54
2
0
30 Oct 2023
Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization
Yoonsoo Nam
Adam Lehavi
Daniel Yang
Digbalay Bose
Swabha Swayamdipta
Shrikanth Narayanan
17
1
0
18 Sep 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
21
13
0
23 Aug 2023
Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations
Seogkyu Jeon
Bei Liu
Pilhyeon Lee
Kibeom Hong
Jianlong Fu
H. Byun
48
1
0
21 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLM
CLIP
46
29
0
15 Aug 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
39
89
0
11 Jul 2023
Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training
Alyssa Huang
Peihan Liu
Ryumei Nakada
Linjun Zhang
Wanrong Zhang
VLM
71
5
0
13 Jun 2023
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu
Jiacheng Zhu
William Jongwon Han
Aditesh Kumar
Karthik Mittal
...
Linjie Li
Jianfeng Wang
Ding Zhao
Bo Li
Lijuan Wang
VGen
18
5
0
07 Jun 2023
Connecting Multi-modal Contrastive Representations
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
30
22
0
22 May 2023
SELF-VS: Self-supervised Encoding Learning For Video Summarization
Hojjat Mokhtarabadi
Kaveh Bahraman
M. Hosseinzadeh
M. Eftekhari
AI4TS
SSL
ViT
25
0
0
28 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
32
1
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
28
30
0
21 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Video Action Recognition with Attentive Semantic Units
Yifei Chen
Dapeng Chen
Ruijin Liu
Hao Li
Wei Peng
21
11
0
17 Mar 2023
Architext: Language-Driven Generative Architecture Design
Theodoros Galanos
Antonios Liapis
Georgios N. Yannakakis
VLM
AI4CE
26
6
0
13 Mar 2023
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Bo He
Jun Wang
Jielin Qiu
Trung Bui
Abhinav Shrivastava
Zhaowen Wang
22
65
0
13 Mar 2023
MetaAID 2.0: An Extensible Framework for Developing Metaverse Applications via Human-controllable Pre-trained Models
Hongyin Zhu
25
6
0
25 Feb 2023
ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence
D. Rivkin
Gregory Dudek
Nikhil Kakodkar
D. Meger
Oliver Limoyo
Xue Liu
F. Hogan
LM&Ro
8
6
0
15 Feb 2023
Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data
Ryumei Nakada
Halil Ibrahim Gulluk
Zhun Deng
Wenlong Ji
James Zou
Linjun Zhang
SSL
VLM
42
36
0
13 Feb 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
24
4
0
05 Jan 2023
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
29
125
0
13 Dec 2022
1
2
Next