Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.11427
Cited By
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
21 November 2022
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David A. Clifton
Jing Chen
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations"
49 / 49 papers shown
Title
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
33
0
0
02 Apr 2025
MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
Saron Samuel
Dan DeGenaro
Jimena Guallar-Blasco
Kate Sanders
Oluwaseun Eisape
...
David Etter
Efsun Kayi
Matthew Wiesner
Kenton W. Murray
Reno Kriz
83
0
0
26 Mar 2025
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun V. Reddy
Alexander Martin
Eugene Yang
Andrew Yates
Kate Sanders
Kenton W. Murray
Reno Kriz
Celso M. De Melo
Benjamin Van Durme
Rama Chellappa
48
1
0
24 Mar 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
50
1
0
21 Mar 2025
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
Zengrong Lin
Zheng Wang
Tianwen Qian
Pan Mu
Sixian Chan
Cong Bai
44
0
0
13 Mar 2025
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
96
4
0
24 Feb 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
H. Li
Li Yuan
Shuicheng Yan
Jie Chen
52
1
0
31 Dec 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
31
13
0
15 Oct 2024
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz
Kate Sanders
David Etter
Kenton W. Murray
Cameron Carpenter
...
Alexander Martin
Ronald Colaianni
Nolan King
Eugene Yang
Benjamin Van Durme
VGen
33
2
0
15 Oct 2024
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
32
4
0
09 Oct 2024
OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning
Shuxin Yang
Xinhan Di
26
1
0
02 Oct 2024
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects
Wenmo Qiu
Xinhan Di
VLM
33
2
0
02 Oct 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
121
7
0
02 Sep 2024
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
Longtao Jiang
Min Wang
Zecheng Li
Yao Fang
Wen-gang Zhou
Houqiang Li
SLR
29
2
0
23 Jul 2024
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Runyi Yu
Chang-Shu Liu
Xiangyang Ji
Li-ming Yuan
Jie Chen
DiffM
23
3
0
15 Jul 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
93
11
0
29 May 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
55
20
0
26 Mar 2024
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Kangning Yin
Shihao Zou
Yuxuan Ge
Zheng Tian
40
5
0
01 Mar 2024
Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
Shaofeng Zhang
Jinfa Huang
Qiang-feng Zhou
Zhibin Wang
Fan Wang
Jiebo Luo
Junchi Yan
DiffM
26
11
0
28 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
50
82
0
29 Dec 2023
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
28
3
0
11 Dec 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
34
222
0
14 Nov 2023
GPT-4V(ision) as A Social Media Analysis Engine
Hanjia Lyu
Jinfa Huang
Daoan Zhang
Yongsheng Yu
Xinyi Mou
Jinsheng Pan
Zhengyuan Yang
Zhongyu Wei
Jiebo Luo
VLM
MLLM
21
33
0
13 Nov 2023
Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval
Junkyu Jang
Eugene Hwang
Sung-Hyuk Park
20
0
0
03 Nov 2023
Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs
Peng Jin
Yang Wu
Yanbo Fan
Zhongqian Sun
Yang Wei
Li-ming Yuan
DiffM
28
28
0
02 Nov 2023
Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
Fei Zhang
Tianfei Zhou
Boyang Li
Hao He
Chaofan Ma
Tianjiao Zhang
Jiangchao Yao
Ya-Qin Zhang
Yanfeng Wang
VLM
37
17
0
29 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
21
9
0
25 Oct 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
29
13
0
25 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLM
CLIP
41
29
0
15 Aug 2023
Improving Scene Graph Generation with Superpixel-Based Interaction Learning
Jingyi Wang
Can Zhang
Jinfa Huang
Bo Ren
Zhidong Deng
23
7
0
04 Aug 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real
P. Li
Chen-Wei Xie
Hongtao Xie
Liming Zhao
Lei Zhang
Yun Zheng
Deli Zhao
Yongdong Zhang
DiffM
VGen
34
56
0
06 Jul 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
42
3
0
28 Jun 2023
Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning
Shaofeng Zhang
Feng Zhu
Rui Zhao
Junchi Yan
21
16
0
23 Jun 2023
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
Yu Wang
Pengchong Qiao
Chang-rui Liu
Guoli Song
Xiawu Zheng
Jie Chen
OODD
26
10
0
29 May 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Peng Jin
Hao Li
Ze-Long Cheng
Jinfa Huang
Zhennan Wang
Li-ming Yuan
Chang-rui Liu
Jie Chen
26
31
0
20 May 2023
TG-VQA: Ternary Game of Video Question Answering
Hao Li
Peng Jin
Ze-Long Cheng
Songyang Zhang
Kai-xiang Chen
Zhennan Wang
Chang-rui Liu
Jie Chen
26
10
0
17 May 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
30
49
0
25 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffM
VGen
24
53
0
17 Mar 2023
Parallel Vertex Diffusion for Unified Visual Grounding
Ze-Long Cheng
Kehan Li
Peng Jin
Xiang Ji
Li-ming Yuan
Chang-rui Liu
Jie Chen
DiffM
34
25
0
13 Mar 2023
FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering
Qichen Ye
Bowen Cao
Nuo Chen
Weiyuan Xu
Yuexian Zou
18
16
0
23 Feb 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
24
45
0
16 Jan 2023
Position Embedding Needs an Independent Layer Normalization
Runyi Yu
Zhennan Wang
Yinhuai Wang
Kehan Li
Yian Zhao
Jian Andrew Zhang
Guoli Song
Jie Chen
26
1
0
10 Dec 2022
ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
Kehan Li
Zhennan Wang
Ze-Long Cheng
Runyi Yu
Yian Zhao
Guoli Song
Chang Liu
Li-ming Yuan
Jie Chen
21
38
0
12 Oct 2022
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
142
75
0
12 Jul 2022
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
156
169
0
20 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
314
780
0
18 Apr 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
401
200
0
13 Jan 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
412
595
0
21 Jul 2020
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
267
3,369
0
09 Mar 2020
1