ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.12763
  4. Cited By
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
v1v2 (latest)

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
    ObjDVLM
ArXiv (abs)PDFHTMLGithub (1008★)

Papers citing "MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"

50 / 616 papers shown
Title
JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human
  Mesh Recovery
JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery
Jiahao Li
Zongxin Yang
Xiaohan Wang
Jianxin Ma
Chang Zhou
Yi Yang
100
13
0
31 Jul 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMeMLLM
126
46
0
30 Jul 2023
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition
Huy Ha
Peter R. Florence
Shuran Song
LM&Ro
101
158
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
148
128
0
25 Jul 2023
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
3DPC
80
22
0
25 Jul 2023
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Jinxian Liu
Chen Ju
Chaofan Ma
Yanfeng Wang
Yu Wang
Ya Zhang
VOS
127
24
0
25 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible
  Expressions
Described Object Detection: Liberating Object Detection with Flexible Expressions
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
89
35
0
24 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based
  Centerpoint Supervision
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
117
5
0
23 Jul 2023
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Zhihong Chen
Ruifei Zhang
Yibing Song
Xiang Wan
Guanbin Li
48
20
0
21 Jul 2023
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for
  Referring Image Segmentation
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Zunnan Xu
Zhihong Chen
Yong Zhang
Yibing Song
Xiang Wan
Guanbin Li
VLM
82
50
0
21 Jul 2023
Divert More Attention to Vision-Language Object Tracking
Divert More Attention to Vision-Language Object Tracking
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
97
6
0
19 Jul 2023
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
  Supervised 3D Visual Grounding
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
92
20
0
18 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present,
  and Future
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Chaoyang Zhu
Long Chen
ObjDVLM
148
40
0
18 Jul 2023
Multimodal Diffusion Segmentation Model for Object Segmentation from
  Manipulation Instructions
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions
Yui Iioka
Y. Yoshida
Yuiga Wada
Shumpei Hatanaka
K. Sugiura
DiffM
118
6
0
17 Jul 2023
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up
  Patch Summarization
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Chaoya Jiang
Haiyang Xu
Wei Ye
Qinghao Ye
Chenliang Li
Mingshi Yan
Bin Bi
Shikun Zhang
Fei Huang
Songfang Huang
VLM
66
9
0
17 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language
  Pre-training
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLMMLLM
106
31
0
13 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
134
520
0
12 Jul 2023
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic
  Manipulation
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
Junghyun Kim
Gi-Cheon Kang
Jaein Kim
Suyeon Shin
Byoung-Tak Zhang
LM&Ro
82
7
0
12 Jul 2023
Prototypical Contrastive Transfer Learning for Multimodal Language
  Understanding
Prototypical Contrastive Transfer Learning for Multimodal Language Understanding
Seitaro Otsuki
Shintaro Ishikawa
K. Sugiura
81
1
0
12 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
168
238
0
07 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
60
5
0
06 Jul 2023
Distilling Large Vision-Language Model with Out-of-Distribution
  Generalizability
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Xuanlin Li
Yunhao Fang
Minghua Liu
Z. Ling
Zhuowen Tu
Haoran Su
VLM
97
25
0
06 Jul 2023
Human Inspired Progressive Alignment and Comparative Learning for
  Grounded Word Acquisition
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition
Yuwei Bao
B. Lattimer
J. Chai
CLL
84
1
0
05 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model
  Planners
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
294
239
0
04 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
AVSegFormer: Audio-Visual Segmentation with Transformer
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
115
52
0
03 Jul 2023
CoPL: Contextual Prompt Learning for Vision-Language Understanding
CoPL: Contextual Prompt Learning for Vision-Language Understanding
Koustava Goswami
Srikrishna Karanam
Prateksha Udhayanan
J. JosephK.
Balaji Vasan Srinivasan
VLM
80
11
0
03 Jul 2023
Statler: State-Maintaining Language Models for Embodied Reasoning
Statler: State-Maintaining Language Models for Embodied Reasoning
Takuma Yoneda
Jiading Fang
Peng Li
Huanyu Zhang
Tianchong Jiang
Shengjie Lin
Ben Picker
David Yunis
Hongyuan Mei
Matthew R. Walter
LM&Ro
85
34
0
30 Jun 2023
Look, Remember and Reason: Grounded reasoning in videos with language
  models
Look, Remember and Reason: Grounded reasoning in videos with language models
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
112
7
0
30 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjDVLM
156
151
0
28 Jun 2023
REFLECT: Summarizing Robot Experiences for Failure Explanation and
  Correction
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
Zeyi Liu
Arpit Bahety
Shuran Song
LRM
114
127
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
130
765
0
26 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching
  Attention and Input
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
45
5
0
25 Jun 2023
DesCo: Learning Object Recognition with Rich Language Descriptions
DesCo: Learning Object Recognition with Rich Language Descriptions
Liunian Harold Li
Zi-Yi Dou
Nanyun Peng
Kai-Wei Chang
ObjDVLM
89
22
0
24 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
  Vision-Language Model for Remote Sensing
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Yuxiang Cai
DiffMVLM
157
66
0
20 Jun 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive
  Coding
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
Zengjie Song
Zhaoxiang Zhang
55
1
0
19 Jun 2023
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot
  Vision-and-Language Navigation
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation
Xiwen Liang
Liang Ma
Shanshan Guo
Jianhua Han
Hang Xu
Shikui Ma
Xiaodan Liang
LM&RoLLMAG
161
4
0
17 Jun 2023
Scaling Open-Vocabulary Object Detection
Scaling Open-Vocabulary Object Detection
Matthias Minderer
A. Gritsenko
N. Houlsby
VLMObjD
131
203
0
16 Jun 2023
Recurrent Action Transformer with Memory
Recurrent Action Transformer with Memory
A. Staroverov
A. Bessonov
Dmitry A. Yudin
A. Kovalev
Aleksandr I. Panov
OffRL
106
7
0
15 Jun 2023
Exploring the Application of Large-scale Pre-trained Models on Adverse
  Weather Removal
Exploring the Application of Large-scale Pre-trained Models on Adverse Weather Removal
Zhentao Tan
Yue-bo Wu
Qiankun Liu
Qi Chu
Le Lu
Jieping Ye
Nenghai Yu
95
13
0
15 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast
  Mapping in Vision-Language Models
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
Ziqiao Ma
Jiayi Pan
J. Chai
ObjDVLM
74
9
0
14 Jun 2023
detrex: Benchmarking Detection Transformers
detrex: Benchmarking Detection Transformers
Tianhe Ren
Siyi Liu
Feng Li
Hao Zhang
Ailing Zeng
...
Zhaoyang Zeng
Xianbiao Qi
Yuhui Yuan
Jianwei Yang
Lei Zhang
83
14
0
12 Jun 2023
EventCLIP: Adapting CLIP for Event-based Object Recognition
EventCLIP: Adapting CLIP for Event-based Object Recognition
Ziyi Wu
Xudong Liu
Igor Gilitschenski
VLM
96
17
0
10 Jun 2023
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Prannay Kaul
Weidi Xie
Andrew Zisserman
ObjDVLMMLLM
73
47
0
08 Jun 2023
Matting Anything
Matting Anything
Jiacheng Li
Jitesh Jain
Humphrey Shi
VLM
97
18
0
08 Jun 2023
ScaleDet: A Scalable Multi-Dataset Object Detector
ScaleDet: A Scalable Multi-Dataset Object Detector
Yanbei Chen
Manchen Wang
Abhay Mittal
Zhenlin Xu
Paolo Favaro
Joseph Tighe
Davide Modolo
ObjD
57
22
0
08 Jun 2023
Fine-Grained Visual Prompting
Fine-Grained Visual Prompting
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjDVLM
117
68
0
07 Jun 2023
Language Adaptive Weight Generation for Multi-task Visual Grounding
Language Adaptive Weight Generation for Multi-task Visual Grounding
Wei Su
Peihan Miao
Huanzhang Dou
Gaoang Wang
Liang Qiao
Zheyang Li
Xi Li
ObjD
84
36
0
06 Jun 2023
Referring Expression Comprehension Using Language Adaptive Inference
Referring Expression Comprehension Using Language Adaptive Inference
Wei Su
Peihan Miao
Huanzhang Dou
Yongjian Fu
Xi Li
ObjD
65
20
0
06 Jun 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
DisCLIP: Open-Vocabulary Referring Expression Generation
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
128
7
0
30 May 2023
Multi-modal Queried Object Detection in the Wild
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjDVLM
133
32
0
30 May 2023
Previous
123...678...111213
Next