ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.07912
  4. Cited By
YORO -- Lightweight End to End Visual Grounding

YORO -- Lightweight End to End Visual Grounding

15 November 2022
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
    ObjD
ArXivPDFHTML

Papers citing "YORO -- Lightweight End to End Visual Grounding"

27 / 77 papers shown
Title
A Corpus for Reasoning About Natural Language Grounded in Photographs
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
74
596
0
01 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
882
93,936
0
11 Oct 2018
Deep Learning for Generic Object Detection: A Survey
Deep Learning for Generic Object Detection: A Survey
Li Liu
Wanli Ouyang
Xiaogang Wang
Paul Fieguth
Jie Chen
Xinwang Liu
M. Pietikäinen
ObjD
VLM
OOD
143
2,438
0
06 Sep 2018
Rethinking Diversified and Discriminative Proposal Generation for Visual
  Grounding
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Zhou Zhao
Q. Tian
Dacheng Tao
ObjD
37
139
0
09 May 2018
YOLOv3: An Incremental Improvement
YOLOv3: An Incremental Improvement
Joseph Redmon
Ali Farhadi
ObjD
83
21,306
0
08 Apr 2018
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
59
1,148
0
21 Mar 2018
MAttNet: Modular Attention Network for Referring Expression
  Comprehension
MAttNet: Modular Attention Network for Referring Expression Comprehension
Licheng Yu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Joey Tianyi Zhou
Tamara L. Berg
ObjD
85
822
0
24 Jan 2018
MobileNetV2: Inverted Residuals and Linear Bottlenecks
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler
Andrew G. Howard
Menglong Zhu
A. Zhmoginov
Liang-Chieh Chen
129
19,124
0
13 Jan 2018
Grounding Referring Expressions in Images by Variational Context
Grounding Referring Expressions in Images by Variational Context
Hanwang Zhang
Yulei Niu
Shih-Fu Chang
BDL
ObjD
42
220
0
05 Dec 2017
Conditional Image-Text Embedding Networks
Conditional Image-Text Embedding Networks
Bryan A. Plummer
Paige Kordas
M. Kiapour
Shuai Zheng
Robinson Piramuthu
Svetlana Lazebnik
37
118
0
22 Nov 2017
Parallel Attention: A Unified Framework for Visual Object Discovery
  through Dialogs and Queries
Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries
Bohan Zhuang
Qi Wu
Chunhua Shen
Ian Reid
Anton Van Den Hengel
ObjD
43
134
0
17 Nov 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
427
129,831
0
12 Jun 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.0K
20,692
0
17 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
51
496
0
11 Apr 2017
Comprehension-guided referring expressions
Comprehension-guided referring expressions
Ruotian Luo
Gregory Shakhnarovich
ObjD
70
171
0
12 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
73
275
0
30 Dec 2016
YOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger
Joseph Redmon
Ali Farhadi
VLM
ObjD
147
15,535
0
25 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
285
3,187
0
02 Dec 2016
Speed/accuracy trade-offs for modern convolutional object detectors
Speed/accuracy trade-offs for modern convolutional object detectors
Jonathan Huang
V. Rathod
Chen Sun
Menglong Zhu
Anoop Korattikara Balan
...
Ian S. Fischer
Z. Wojna
Yang Song
S. Guadarrama
Kevin Patrick Murphy
3DH
3DV
80
2,567
0
30 Nov 2016
Modeling Relationships in Referential Expressions with Compositional
  Modular Networks
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
55
402
0
30 Nov 2016
Modeling Context in Referring Expressions
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
95
1,250
0
31 Jul 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
256
1,466
0
06 Jun 2016
Natural Language Object Retrieval
Natural Language Object Retrieval
Ronghang Hu
Huazhe Xu
Marcus Rohrbach
Jiashi Feng
Kate Saenko
Trevor Darrell
ObjD
65
552
0
13 Nov 2015
Grounding of Textual Phrases in Images by Reconstruction
Grounding of Textual Phrases in Images by Reconstruction
Anna Rohrbach
Marcus Rohrbach
Ronghang Hu
Trevor Darrell
Bernt Schiele
53
497
0
12 Nov 2015
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
543
36,643
0
08 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
396
61,900
0
04 Jun 2015
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
237
43,290
0
01 May 2014
Previous
12