Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.07494
Cited By
G
2
^{2}
2
TR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models
10 October 2024
Riya Arora
N. N.
Aman Tambi
Sandeep S. Zachariah
Souvik Chakraborty
Rohan Paul
LM&Ro
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"G$^{2}$TR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models"
15 / 15 papers shown
Title
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip Torr
Lu Yuan
LRM
VLM
65
8
0
05 Jul 2024
Enabling robots to follow abstract instructions and complete complex dynamic tasks
Ruaridh Mon-Williams
Gen Li
Ran Long
Wenqian Du
Chris Lucas
LM&Ro
71
2
0
17 Jun 2024
TempCompass: Do Video LLMs Really Understand Videos?
Yuanxin Liu
Shicheng Li
Yi Liu
Yuxiang Wang
Shuhuai Ren
Lei Li
Sishuo Chen
Xu Sun
Lu Hou
VLM
123
140
0
01 Mar 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
189
99
0
29 Dec 2023
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
128
515
0
06 Nov 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,761
0
15 Mar 2023
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains
Haoshu Fang
Chenxi Wang
Hongjie Fang
Minghao Gou
Jirong Liu
Hengxu Yan
Wenhai Liu
Yichen Xie
Cewu Lu
133
210
0
16 Dec 2022
Reasoning with Scene Graphs for Robot Planning under Partial Observability
S. Amiri
Kishan Chandan
Shiqi Zhang
87
46
0
21 Feb 2022
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
136
1,067
0
07 Dec 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
187
890
0
26 Apr 2021
Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following
Valts Blukis
Ross A. Knepper
Yoav Artzi
LM&Ro
50
33
0
14 Nov 2020
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
121
448
0
25 Sep 2019
Temporal Relational Reasoning in Videos
Bolei Zhou
A. Andonian
Aude Oliva
Antonio Torralba
NAI
105
1,041
0
22 Nov 2017
Video In Sentences Out
Andrei Barbu
Alexander Bridge
Zachary Burchill
D. Coroian
Sven J. Dickinson
...
Jarrell W. Waggoner
Song Wang
Jinlian Wei
Yifan Yin
Zhiqi Zhang
69
156
0
09 Aug 2014
Saying What You're Looking For: Linguistics Meets Video Search
Andrei Barbu
Siddharth Narayanaswamy
J. Siskind
63
22
0
20 Sep 2013
1