ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.03089
  4. Cited By
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

6 June 2021
Muchen Li
Leonid Sigal
    ObjD
ArXivPDFHTML

Papers citing "Referring Transformer: A One-step Approach to Multi-task Visual Grounding"

50 / 54 papers shown
Title
RemoteSAM: Towards Segment Anything for Earth Observation
RemoteSAM: Towards Segment Anything for Earth Observation
Liang Yao
Fan Liu
Delong Chen
Chuanyi Zhang
Yijun Wang
Ziyun Chen
Wei Xu
Shimin Di
Yuhui Zheng
140
0
0
23 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
99
3
0
17 May 2025
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding
Jingchao Wang
Hong Wang
Wenlong Zhang
Kunhua Ji
Dingjiang Huang
Yefeng Zheng
ObjD
80
0
0
22 Apr 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
175
4
0
31 Dec 2024
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu
Hanqing Wang
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
LRM
LM&Ro
146
2
0
02 Dec 2024
ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model
Kunyang Han
Yibo Hu
Mengxue Qu
Hailin Shi
Yao Zhao
Y. X. Wei
MLLM
VLM
3DV
157
1
0
29 Nov 2024
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
Runwei Guan
Jianan Liu
Liye Jia
Haocheng Zhao
Shanliang Yao
Xiaohui Zhu
Ka Lok Man
Eng Gee Lim
Jeremy S. Smith
Yutao Yue
105
5
0
30 Aug 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
108
30
0
28 Jun 2024
Fast Convergence of DETR with Spatially Modulated Co-Attention
Fast Convergence of DETR with Spatially Modulated Co-Attention
Peng Gao
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Hongsheng Li
ViT
70
307
0
05 Aug 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
163
879
0
26 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
60
339
0
17 Apr 2021
Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Ya Jing
Tao Kong
Wei Wang
Liang Wang
Lei Li
Tieniu Tan
63
136
0
30 Mar 2021
Rethinking Transformer-based Set Prediction for Object Detection
Rethinking Transformer-based Set Prediction for Object Detection
Zhiqing Sun
Shengcao Cao
Yiming Yang
Kris Kitani
ViT
117
321
0
21 Nov 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
191
5,046
0
08 Oct 2020
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Shaofei Huang
Tianrui Hui
Si Liu
Guanbin Li
Yunchao Wei
Jizhong Han
Luoqi Liu
Yue Liu
EgoV
61
181
0
01 Oct 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through
  Scene Graph
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
61
377
0
30 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
56
494
0
11 Jun 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
345
12,966
0
26 May 2020
Multi-task Collaborative Network for Joint Referring Expression
  Comprehension and Segmentation
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
236
291
0
19 Mar 2020
12-in-1: Multi-Task Vision and Language Representation Learning
12-in-1: Multi-Task Vision and Language Representation Learning
Jiasen Lu
Vedanuj Goswami
Marcus Rohrbach
Devi Parikh
Stefan Lee
VLM
ObjD
73
478
0
05 Dec 2019
Referring Expression Object Segmentation with Caption-Aware Consistency
Referring Expression Object Segmentation with Caption-Aware Consistency
Yi-Wen Chen
Yi-Hsuan Tsai
Tiantian Wang
Yen-Yu Lin
Ming-Hsuan Yang
EgoV
42
87
0
10 Oct 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
227
2,474
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
59
159
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
46
362
0
18 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
215
3,667
0
06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
512
24,351
0
26 Jul 2019
Learning to Compose and Reason with Language Tree Structures for Visual
  Grounding
Learning to Compose and Reason with Language Tree Structures for Visual Grounding
Richang Hong
Daqing Liu
Xiaoyu Mo
Xiangnan He
Hanwang Zhang
ReLM
LRM
77
159
0
05 Jun 2019
CenterNet: Keypoint Triplets for Object Detection
CenterNet: Keypoint Triplets for Object Detection
Kaiwen Duan
S. Bai
Lingxi Xie
H. Qi
Qingming Huang
Q. Tian
NoLa
109
2,684
0
17 Apr 2019
Cross-Modal Self-Attention Network for Referring Image Segmentation
Cross-Modal Self-Attention Network for Referring Image Segmentation
Linwei Ye
Mrigank Rochan
Zhi Liu
Yang Wang
EgoV
42
475
0
09 Apr 2019
Neural Sequential Phrase Grounding (SeqGROUND)
Neural Sequential Phrase Grounding (SeqGROUND)
Pelin Dogan
Leonid Sigal
Markus Gross
ObjD
55
52
0
18 Mar 2019
Improving Referring Expression Grounding with Cross-modal
  Attention-guided Erasing
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
70
181
0
03 Mar 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding
  Box Regression
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
145
4,143
0
25 Feb 2019
Real-Time Referring Expression Comprehension by Single-Stage Grounding
  Network
Real-Time Referring Expression Comprehension by Single-Stage Grounding Network
Xinpeng Chen
Lin Ma
Jingyuan Chen
Zequn Jie
Wen Liu
Jiebo Luo
ObjD
53
112
0
09 Dec 2018
Learning to Assemble Neural Module Tree Networks for Visual Grounding
Learning to Assemble Neural Module Tree Networks for Visual Grounding
Daqing Liu
Hanwang Zhang
Feng Wu
Zhengjun Zha
47
269
0
08 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.4K
94,511
0
11 Oct 2018
Dynamic Multimodal Instance Segmentation guided by natural language
  queries
Dynamic Multimodal Instance Segmentation guided by natural language queries
Edgar Margffoy-Tuay
Juan C. Pérez
Emilio Botero
Pablo Arbelaez
47
173
0
06 Jul 2018
Rethinking Diversified and Discriminative Proposal Generation for Visual
  Grounding
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Zhou Zhao
Q. Tian
Dacheng Tao
ObjD
49
139
0
09 May 2018
YOLOv3: An Incremental Improvement
YOLOv3: An Incremental Improvement
Joseph Redmon
Ali Farhadi
ObjD
105
21,386
0
08 Apr 2018
MAttNet: Modular Attention Network for Referring Expression
  Comprehension
MAttNet: Modular Attention Network for Referring Expression Comprehension
Licheng Yu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Joey Tianyi Zhou
Tamara L. Berg
ObjD
97
823
0
24 Jan 2018
Conditional Image-Text Embedding Networks
Conditional Image-Text Embedding Networks
Bryan A. Plummer
Paige Kordas
M. Kiapour
Shuai Zheng
Robinson Piramuthu
Svetlana Lazebnik
45
118
0
22 Nov 2017
Query-guided Regression Network with Context Policy for Phrase Grounding
Query-guided Regression Network with Context Policy for Phrase Grounding
Kan Chen
Rama Kovvuri
Ram Nevatia
58
142
0
04 Aug 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
60
498
0
11 Apr 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
344
27,129
0
20 Mar 2017
Modeling Relationships in Referential Expressions with Compositional
  Modular Networks
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
73
405
0
30 Nov 2016
Modeling Context in Referring Expressions
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
125
1,261
0
31 Jul 2016
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image
  Segmentation
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
Fausto Milletari
Nassir Navab
Seyed-Ahmad Ahmadi
204
8,660
0
15 Jun 2016
Fully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation
Evan Shelhamer
Jonathan Long
Trevor Darrell
VOS
SSeg
508
37,806
0
20 May 2016
Segmentation from Natural Language Expressions
Segmentation from Natural Language Expressions
Ronghang Hu
Marcus Rohrbach
Trevor Darrell
VLM
EgoV
69
434
0
20 Mar 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
194
5,726
0
23 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.8K
193,426
0
10 Dec 2015
12
Next