Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
16 / 616 papers shown
Title
CLIPort: What and Where Pathways for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
163
661
0
24 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
117
62
0
13 Sep 2021
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
Zhenzhi Wang
Limin Wang
Tao Wu
Tianhao Li
Gangshan Wu
AI4TS
112
122
0
10 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Stella Frank
Emanuele Bugliarello
Desmond Elliott
86
82
0
09 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
31
2
0
09 Sep 2021
SORNet: Spatial Object-Centric Representations for Sequential Manipulation
Wentao Yuan
Chris Paxton
Karthik Desingh
Dieter Fox
3DPC
233
72
0
08 Sep 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
108
66
0
25 Aug 2021
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
ViT
75
67
0
20 Jul 2021
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
Junha Roh
Karthik Desingh
Ali Farhadi
Dieter Fox
98
95
0
07 Jul 2021
Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding
Chaolei Tan
Zihang Lin
Jianfang Hu
Xiang Li
Weishi Zheng
141
9
0
20 Jun 2021
How Modular Should Neural Module Networks Be for Systematic Generalization?
Vanessa D’Amario
Tomotake Sasaki
Xavier Boix
69
17
0
15 Jun 2021
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Ludan Ruan
Jieting Chen
Yuqing Song
Shizhe Chen
Qin Jin
34
0
0
11 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
119
197
0
06 Jun 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
86
4
0
12 May 2021
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
100
53
0
01 Apr 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
406
2,569
0
04 Jan 2021
Previous
1
2
3
...
11
12
13