ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11427
  4. Cited By
Expectation-Maximization Contrastive Learning for Compact
  Video-and-Language Representations

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

21 November 2022
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David A. Clifton
Jing Chen
    VLM
ArXivPDFHTML

Papers citing "Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations"

49 / 49 papers shown
Title
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
33
0
0
02 Apr 2025
MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
Saron Samuel
Dan DeGenaro
Jimena Guallar-Blasco
Kate Sanders
Oluwaseun Eisape
...
David Etter
Efsun Kayi
Matthew Wiesner
Kenton W. Murray
Reno Kriz
83
0
0
26 Mar 2025
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun V. Reddy
Alexander Martin
Eugene Yang
Andrew Yates
Kate Sanders
Kenton W. Murray
Reno Kriz
Celso M. De Melo
Benjamin Van Durme
Rama Chellappa
48
1
0
24 Mar 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
50
1
0
21 Mar 2025
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
Zengrong Lin
Zheng Wang
Tianwen Qian
Pan Mu
Sixian Chan
Cong Bai
44
0
0
13 Mar 2025
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
96
4
0
24 Feb 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
H. Li
Li Yuan
Shuicheng Yan
Jie Chen
52
1
0
31 Dec 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
31
13
0
15 Oct 2024
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz
Kate Sanders
David Etter
Kenton W. Murray
Cameron Carpenter
...
Alexander Martin
Ronald Colaianni
Nolan King
Eugene Yang
Benjamin Van Durme
VGen
33
2
0
15 Oct 2024
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation
  Experts
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
32
4
0
09 Oct 2024
OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the
  Understanding of Occluded Objects with Self-Supervised Test-Time Learning
OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning
Shuxin Yang
Xinhan Di
26
1
0
02 Oct 2024
OCC-MLLM:Empowering Multimodal Large Language Model For the
  Understanding of Occluded Objects
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects
Wenmo Qiu
Xinhan Di
VLM
33
2
0
02 Oct 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
121
7
0
02 Sep 2024
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language
  Retrieval
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
Longtao Jiang
Min Wang
Zecheng Li
Yao Fang
Wen-gang Zhou
Houqiang Li
SLR
29
2
0
23 Jul 2024
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Runyi Yu
Chang-Shu Liu
Xiangyang Ji
Li-ming Yuan
Jie Chen
DiffM
23
3
0
15 Jul 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
93
11
0
29 May 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang
Guohao Sun
Pichao Wang
Dongfang Liu
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Zhiqiang Tao
VGen
55
20
0
26 Mar 2024
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Kangning Yin
Shihao Zou
Yuxuan Ge
Zheng Tian
40
5
0
01 Mar 2024
Continuous-Multiple Image Outpainting in One-Step via Positional Query
  and A Diffusion-based Approach
Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
Shaofeng Zhang
Jinfa Huang
Qiang-feng Zhou
Zhibin Wang
Fan Wang
Jiebo Luo
Junchi Yan
DiffM
26
11
0
28 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
50
82
0
29 Dec 2023
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
28
3
0
11 Dec 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models
  with Image and Video Understanding
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
34
222
0
14 Nov 2023
GPT-4V(ision) as A Social Media Analysis Engine
GPT-4V(ision) as A Social Media Analysis Engine
Hanjia Lyu
Jinfa Huang
Daoan Zhang
Yongsheng Yu
Xinyi Mou
Jinsheng Pan
Zhengyuan Yang
Zhongyu Wei
Jiebo Luo
VLM
MLLM
21
33
0
13 Nov 2023
Lost Your Style? Navigating with Semantic-Level Approach for
  Text-to-Outfit Retrieval
Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval
Junkyu Jang
Eugene Hwang
Sung-Hyuk Park
20
0
0
03 Nov 2023
Act As You Wish: Fine-Grained Control of Motion Diffusion Model with
  Hierarchical Semantic Graphs
Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs
Peng Jin
Yang Wu
Yanbo Fan
Zhongqian Sun
Yang Wei
Li-ming Yuan
DiffM
28
28
0
02 Nov 2023
Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
  Segmentation
Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
Fei Zhang
Tianfei Zhou
Boyang Li
Hao He
Chaofan Ma
Tianjiao Zhang
Jiangchao Yao
Ya-Qin Zhang
Yanfeng Wang
VLM
37
17
0
29 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
21
9
0
25 Oct 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual
  Captioning
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
29
13
0
25 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLM
CLIP
41
29
0
15 Aug 2023
Improving Scene Graph Generation with Superpixel-Based Interaction
  Learning
Improving Scene Graph Generation with Superpixel-Based Interaction Learning
Jingyi Wang
Can Zhang
Jinfa Huang
Bo Ren
Zhidong Deng
23
7
0
04 Aug 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real
MomentDiff: Generative Video Moment Retrieval from Random to Real
P. Li
Chen-Wei Xie
Hongtao Xie
Liming Zhao
Lei Zhang
Yun Zheng
Deli Zhao
Yongdong Zhang
DiffM
VGen
34
56
0
06 Jul 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video
  Retrieval Models
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
42
3
0
28 Jun 2023
Patch-Level Contrasting without Patch Correspondence for Accurate and
  Dense Contrastive Representation Learning
Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning
Shaofeng Zhang
Feng Zhu
Rui Zhao
Junchi Yan
21
16
0
23 Jun 2023
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
Yu Wang
Pengchong Qiao
Chang-rui Liu
Guoli Song
Xiawu Zheng
Jie Chen
OODD
26
10
0
29 May 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set
  Alignment
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Peng Jin
Hao Li
Ze-Long Cheng
Jinfa Huang
Zhennan Wang
Li-ming Yuan
Chang-rui Liu
Jie Chen
26
31
0
20 May 2023
TG-VQA: Ternary Game of Video Question Answering
TG-VQA: Ternary Game of Video Question Answering
Hao Li
Peng Jin
Ze-Long Cheng
Songyang Zhang
Kai-xiang Chen
Zhennan Wang
Chang-rui Liu
Jie Chen
26
10
0
17 May 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for
  Cross-Modal Representation Learning
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
30
49
0
25 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffM
VGen
24
53
0
17 Mar 2023
Parallel Vertex Diffusion for Unified Visual Grounding
Parallel Vertex Diffusion for Unified Visual Grounding
Ze-Long Cheng
Kehan Li
Peng Jin
Xiang Ji
Li-ming Yuan
Chang-rui Liu
Jie Chen
DiffM
34
25
0
13 Mar 2023
FiTs: Fine-grained Two-stage Training for Knowledge-aware Question
  Answering
FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering
Qichen Ye
Bowen Cao
Nuo Chen
Weiyuan Xu
Yuexian Zou
18
16
0
23 Feb 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
24
45
0
16 Jan 2023
Position Embedding Needs an Independent Layer Normalization
Position Embedding Needs an Independent Layer Normalization
Runyi Yu
Zhennan Wang
Yinhuai Wang
Kehan Li
Yian Zhao
Jian Andrew Zhang
Guoli Song
Jie Chen
26
1
0
10 Dec 2022
ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
Kehan Li
Zhennan Wang
Ze-Long Cheng
Runyi Yu
Yian Zhao
Guoli Song
Chang Liu
Li-ming Yuan
Jie Chen
21
38
0
12 Oct 2022
Video Graph Transformer for Video Question Answering
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
142
75
0
12 Jul 2022
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
156
169
0
20 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
314
780
0
18 Apr 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
401
200
0
13 Jan 2021
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
412
595
0
21 Jul 2020
Improved Baselines with Momentum Contrastive Learning
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
267
3,369
0
09 Mar 2020
1