ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.08063
  4. Cited By
MINOTAUR: Multi-task Video Grounding From Multimodal Queries

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

16 February 2023
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
ArXivPDFHTML

Papers citing "MINOTAUR: Multi-task Video Grounding From Multimodal Queries"

11 / 11 papers shown
Title
MCAT: Visual Query-Based Localization of Standard Anatomical Clips in Fetal Ultrasound Videos Using Multi-Tier Class-Aware Token Transformer
MCAT: Visual Query-Based Localization of Standard Anatomical Clips in Fetal Ultrasound Videos Using Multi-Tier Class-Aware Token Transformer
Divyanshu Mishra
Pramit Saha
He Zhao
Netzahualcoyotl Hernandez-Cruz
Olga Patey
A. Papageorghiou
J. A. Noble
26
0
0
08 Apr 2025
Survey of User Interface Design and Interaction Techniques in Generative
  AI Applications
Survey of User Interface Design and Interaction Techniques in Generative AI Applications
Reuben Luera
Ryan Rossi
Alexa F. Siu
Franck Dernoncourt
Tong Yu
...
Hanieh Salehy
Jian Zhao
Samyadeep Basu
Puneet Mathur
Nedim Lipka
AI4TS
63
1
0
28 Oct 2024
Localizing Events in Videos with Multimodal Queries
Localizing Events in Videos with Multimodal Queries
Gengyuan Zhang
Mang Ling Ada Fok
Yan Xia
Yansong Tang
Daniel Cremers
Philip H. S. Torr
Volker Tresp
Jindong Gu
31
1
0
14 Jun 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Tiantian Geng
Teng Wang
Yanfu Zhang
Jinming Duan
Weili Guan
Feng Zheng
26
0
0
04 Apr 2024
A survey of Generative AI Applications
A survey of Generative AI Applications
Roberto Gozalo-Brizuela
Eduardo C. Garrido-Merchán
3DV
MedIm
21
80
0
05 Jun 2023
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
Ahmad Darkhalil
Dandan Shan
Bin Zhu
Jian Ma
Amlan Kar
Richard E. L. Higgins
Sanja Fidler
David Fouhey
Dima Damen
VOS
47
98
0
26 Sep 2022
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video
  Temporal Grounding
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
Zhijian Hou
Wanjun Zhong
Lei Ji
Difei Gao
Kun Yan
W. Chan
Chong-Wah Ngo
Zheng Shou
Nan Duan
AI4TS
34
24
0
22 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
232
1,024
0
13 Oct 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
78
178
0
03 Feb 2021
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Tianwei Lin
Xu Zhao
Haisheng Su
Chongjing Wang
Ming Yang
139
700
0
08 Jun 2018
1