Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.03788
Cited By
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
4 July 2024
Thong Nguyen
Yi Bin
Xiaobao Wu
Xinshuai Dong
Zhiyuan Hu
Khoi M. Le
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning"
21 / 21 papers shown
Title
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Thong Nguyen
Zhiyuan Hu
Xu Lin
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
51
0
0
19 May 2025
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
120
13
1
09 Jun 2024
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
63
79
0
09 Dec 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
71
150
0
15 Sep 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
51
112
0
07 Jun 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
160
62
0
17 May 2022
Temporal Alignment Networks for Long-term Video
Tengda Han
Weidi Xie
Andrew Zisserman
AI4TS
58
85
0
06 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
78
41
0
06 Apr 2022
Disentangled Representation Learning for Text-Video Retrieval
Qiang Wang
Yanhao Zhang
Yun Zheng
Pan Pan
Xiansheng Hua
58
77
0
14 Mar 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
59
167
0
20 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
303
567
0
28 Sep 2021
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
84
1,458
0
24 Jun 2021
Contrastive Fine-tuning Improves Robustness for Neural Rankers
Xiaofei Ma
Cicero Nogueira dos Santos
Andrew O. Arnold
59
20
0
27 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
43
132
0
20 May 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
289
581
0
22 Apr 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
112
651
0
11 Feb 2021
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
111
419
0
14 Nov 2020
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
103
1,192
0
07 Jun 2019
Learning to Reweight Examples for Robust Deep Learning
Mengye Ren
Wenyuan Zeng
Binh Yang
R. Urtasun
OOD
NoLa
132
1,419
0
24 Mar 2018
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
95
940
0
04 Aug 2017
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
432
61,900
0
04 Jun 2015
1