ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.11683
  4. Cited By
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

28 November 2018
Hassan Akbari
Svebor Karaman
Surabhi Bhargava
Brian Chen
Carl Vondrick
Shih-Fu Chang
ArXivPDFHTML

Papers citing "Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding"

18 / 18 papers shown
Title
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in
  Untrimmed Multi-Action Videos from Narrated Instructions
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
40
7
0
29 Mar 2023
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and
  Grounding
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
50
25
0
28 Nov 2022
PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative
  Grounding
PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding
Zihan Ding
Zixiang Ding
Tianrui Hui
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
Si Liu
17
12
0
11 Aug 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
33
154
0
11 Feb 2022
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media
  Knowledge Extraction and Grounding
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Revanth Reddy Gangi Reddy
Xilin Rui
Manling Li
Xudong Lin
Haoyang Wen
...
Joey Tianyi Zhou
Avirup Sil
Shih-Fu Chang
A. Schwing
Heng Ji
25
31
0
20 Dec 2021
Predicting Physical World Destinations for Commands Given to
  Self-Driving Cars
Predicting Physical World Destinations for Commands Given to Self-Driving Cars
Dusan Grujicic
Thierry Deruyttere
Marie-Francine Moens
Matthew Blaschko
OOD
27
6
0
10 Dec 2021
Relation-aware Instance Refinement for Weakly Supervised Visual
  Grounding
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
24
55
0
24 Mar 2021
Co-Grounding Networks with Semantic Attention for Referring Expression
  Comprehension in Videos
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
21
16
0
23 Mar 2021
Open-Vocabulary Object Detection Using Captions
Open-Vocabulary Object Detection Using Captions
Alireza Zareian
Kevin Dela Rosa
Derek Hao Hu
Shih-Fu Chang
VLM
ObjD
44
417
0
20 Nov 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase
  Grounding
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
47
11
0
12 Oct 2020
Learning Object Detection from Captions via Textual Scene Attributes
Learning Object Detection from Captions via Textual Scene Attributes
Achiya Jerbi
Roei Herzig
Jonathan Berant
Gal Chechik
Amir Globerson
27
21
0
30 Sep 2020
Cosine meets Softmax: A tough-to-beat baseline for visual grounding
Cosine meets Softmax: A tough-to-beat baseline for visual grounding
N. Rufus
U. R. Nair
K. M. Krishna
Vineet Gandhi
27
13
0
13 Sep 2020
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression
  Grounding
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen
Wenbo Ma
Jun Xiao
Hanwang Zhang
Shih-Fu Chang
ObjD
17
89
0
03 Sep 2020
PhraseCut: Language-based Image Segmentation in the Wild
PhraseCut: Language-based Image Segmentation in the Wild
Chenyun Wu
Zhe-nan Lin
Scott D. Cohen
Trung Bui
Subhransu Maji
VLM
13
111
0
03 Aug 2020
Contrastive Learning for Weakly Supervised Phrase Grounding
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta
Arash Vahdat
Gal Chechik
Xiaodong Yang
Jan Kautz
Derek Hoiem
ObjD
SSL
42
140
0
17 Jun 2020
A Multimodal Target-Source Classifier with Attention Branches to
  Understand Ambiguous Instructions for Fetching Daily Objects
A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
A. Magassouba
K. Sugiura
Hisashi Kawai
38
9
0
23 Dec 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption
  Alignment
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
14
102
0
27 Mar 2019
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,464
0
06 Jun 2016
1