ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.04446
  4. Cited By
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

8 December 2021
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David F. Harwath
James R. Glass
Hilde Kuehne
    ViT
ArXivPDFHTML

Papers citing "Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval"

50 / 71 papers shown
Title
HierSum: A Global and Local Attention Mechanism for Video Summarization
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
67
0
0
25 Apr 2025
Unifying Light Field Perception with Field of Parallax
Fei Teng
Buyin Deng
Boyuan Zheng
Kai Luo
Kunyu Peng
Jiaming Zhang
Kailun Yang
34
0
0
02 Mar 2025
Contrasting with Symile: Simple Model-Agnostic Representation Learning
  for Unlimited Modalities
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
A. Saporta
A. Puli
Mark Goldstein
Rajesh Ranganath
SSL
36
0
0
01 Nov 2024
Leveraging Contrastive Learning and Self-Training for Multimodal Emotion
  Recognition with Limited Labeled Samples
Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
Qi Fan
Yutong Li
Yi Xin
Xinyu Cheng
Guanglai Gao
Miao Ma
36
4
0
23 Aug 2024
ASR-enhanced Multimodal Representation Learning for Cross-Domain Product
  Retrieval
ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval
Ruixiang Zhao
Jian Jia
Yan Li
Xuehan Bai
Quan Chen
Han Li
Peng Jiang
Xirong Li
28
0
0
06 Aug 2024
Joint-Dataset Learning and Cross-Consistent Regularization for
  Text-to-Motion Retrieval
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
Nicola Messina
J. Sedmidubský
Fabrizio Falchi
Tomáš Rebok
38
0
0
02 Jul 2024
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision
  Transformer
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia
Jianyuan Guo
Kai Han
Han Wu
Chao Zhang
Chang Xu
Xinghao Chen
ViT
42
15
0
03 Jun 2024
Unified Video-Language Pre-training with Synchronized Audio
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
30
2
0
12 May 2024
Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal
  Violence Detection
Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal Violence Detection
Shengyang Sun
Xiaojin Gong
18
4
0
08 May 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric
  Videos
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David F. Harwath
Kristen Grauman
EgoV
SSL
28
6
0
08 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
Reconstruct before Query: Continual Missing Modality Learning with
  Decomposed Prompt Collaboration
Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration
Shu Zhao
Xiaohan Zou
Tan Yu
Huijuan Xu
31
1
0
17 Mar 2024
Unlocking the Potential of Multimodal Unified Discrete Representation
  through Training-Free Codebook Optimization and Hierarchical Alignment
Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
Hai Huang
Yan Xia
Shengpeng Ji
Shulei Wang
Hanting Wang
Jieming Zhu
Zhenhua Dong
Zhou Zhao
27
6
0
08 Mar 2024
Learning the Unlearned: Mitigating Feature Suppression in Contrastive
  Learning
Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
Jihai Zhang
Xiang Lan
Xiaoye Qu
Yu Cheng
Mengling Feng
Bryan Hooi
SSL
24
4
0
19 Feb 2024
Video Editing for Video Retrieval
Video Editing for Video Retrieval
Bin Zhu
Kevin Flanagan
A. Fragomeni
Michael Wray
Dima Damen
CLIP
31
0
0
04 Feb 2024
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Yijie Lin
Jie Zhang
Zhenyu Huang
Jia-Wei Liu
Zujie Wen
Xi Peng
34
18
0
30 Jan 2024
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Nicolae-Cătălin Ristea
Andrei Anghel
Radu Tudor Ionescu
28
2
0
15 Jan 2024
LeanVec: Searching vectors faster by making them fit
LeanVec: Searching vectors faster by making them fit
Mariano Tepper
Ishwar Bhati
Cecilia Aguerrebere
Mark Hildebrand
Ted Willke
VLM
OODD
21
1
0
26 Dec 2023
CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual
  Knowledge Transfer
CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer
Yabing Wang
Fan Wang
Jianfeng Dong
Hao Luo
VLM
24
8
0
14 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for
  Audio-Visual Semantic Segmentation
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
17
5
0
14 Dec 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive
  Survey and Evaluation
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
31
8
0
24 Oct 2023
Encoding and Decoding Narratives: Datafication and Alternative Access
  Models for Audiovisual Archives
Encoding and Decoding Narratives: Datafication and Alternative Access Models for Audiovisual Archives
Yuchen Yang
33
1
0
10 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual
  Archives
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives
Yuchen Yang
VGen
19
7
0
09 Oct 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
35
25
0
07 Oct 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for
  Text-Video Retrieval
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
23
3
0
16 Sep 2023
Transparent Object Tracking with Enhanced Fusion Module
Transparent Object Tracking with Enhanced Fusion Module
Kalyan Garigapati
Erik P. Blasch
Jie Wei
Haibin Ling
13
2
0
13 Sep 2023
Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal
  Retrieval
Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval
Yabing Wang
Shuhui Wang
Hao Luo
Jianfeng Dong
F. Wang
Meng Han
Xun Wang
Meng Wang
17
8
0
11 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
26
5
0
10 Sep 2023
Have We Ever Encountered This Before? Retrieving Out-of-Distribution
  Road Obstacles from Driving Scenes
Have We Ever Encountered This Before? Retrieving Out-of-Distribution Road Obstacles from Driving Scenes
Youssef Shoeb
Robin Shing Moon Chan
Gesina Schwalbe
Azarm Nowzard
Fatma Guney
Hanno Gottschalk
11
6
0
08 Sep 2023
FArMARe: a Furniture-Aware Multi-task methodology for Recommending
  Apartments based on the user interests
FArMARe: a Furniture-Aware Multi-task methodology for Recommending Apartments based on the user interests
Ali Abdari
Alex Falcon
Giuseppe Serra
30
2
0
06 Sep 2023
Preserving Modality Structure Improves Multi-Modal Learning
Preserving Modality Structure Improves Multi-Modal Learning
Swetha Sirnam
Mamshad Nayeem Rizve
Nina Shvetsova
Hilde Kuehne
M. Shah
25
4
0
24 Aug 2023
Opening the Vocabulary of Egocentric Actions
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
36
16
0
22 Aug 2023
Interpretation on Multi-modal Visual Fusion
Interpretation on Multi-modal Visual Fusion
Hao Chen
Hao Zhou
Yongjian Deng
28
0
0
19 Aug 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
  Alignment
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
44
14
0
24 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
23
23
0
17 Jul 2023
Learning Unseen Modality Interaction
Learning Unseen Modality Interaction
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
22
3
0
22 Jun 2023
Iterated Piecewise Affine (IPA) Approximation for Language Modeling
Iterated Piecewise Affine (IPA) Approximation for Language Modeling
Davood Shamsi
Wenhui Hua
Brian Williams
13
0
0
21 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
33
21
0
06 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and
  Outlook of Recent Work
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
23
0
0
02 Jun 2023
Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion
  Data and Natural Language
Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language
Nicola Messina
J. Sedmidubský
Fabrizio Falchi
Tomávs Rebok
EGVM
26
10
0
25 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health
  Management: A Survey and Roadmaps
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
27
46
0
10 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze
  Anticipation
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
21
8
0
06 May 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
31
102
0
17 Apr 2023
Similarity search in the blink of an eye with compressed indices
Similarity search in the blink of an eye with compressed indices
Cecilia Aguerrebere
Ishwar Bhati
Mark Hildebrand
Mariano Tepper
Ted Willke
11
27
0
07 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
19
43
0
31 Mar 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in
  Untrimmed Multi-Action Videos from Narrated Instructions
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
32
7
0
29 Mar 2023
Temperature Schedules for Self-Supervised Contrastive Methods on
  Long-Tail Data
Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data
Anna Kukleva
Moritz Bohle
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
29
39
0
23 Mar 2023
Deep Learning for Video-Text Retrieval: a Review
Deep Learning for Video-Text Retrieval: a Review
Cunjuan Zhu
Qi Jia
Wei-Neng Chen
Yanming Guo
Yu Liu
24
14
0
24 Feb 2023
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Yimu Wang
Peng Shi
8
5
0
19 Feb 2023
Zorro: the masked multimodal transformer
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
37
20
0
23 Jan 2023
12
Next