ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.00899
  4. Cited By
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

1 November 2023
P. Sermanet
Tianli Ding
Jeffrey Zhao
Fei Xia
Debidatta Dwibedi
K. Gopalakrishnan
Christine Chan
Gabriel Dulac-Arnold
Sharath Maddineni
Nikhil J. Joshi
Pete Florence
Wei Han
Robert Baruch
Yao Lu
Suvir Mirchandani
Peng Xu
Pannag R Sanketi
Karol Hausman
Izhak Shafran
Brian Ichter
Yuan Cao
    LM&Ro
ArXiv (abs)PDFHTML

Papers citing "RoboVQA: Multimodal Long-Horizon Reasoning for Robotics"

28 / 28 papers shown
Title
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
Quang P.M. Pham
Khoi T.N. Nguyen
Nhi H. Doan
Cuong Pham
Kentaro Inui
Dezhen Song
258
0
0
01 May 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jing Zhang
Xiaohui Zeng
Zhe Zhang
AI4CELM&RoLRM
175
12
0
18 Mar 2025
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
...
Zhenyu Wang
Pengju An
Siyuan Qian
Shanghang Zhang
Jian Tang
LM&Ro
210
23
0
18 Dec 2024
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song
Weixing Chen
Yang Liu
Weikai Chen
Guanbin Li
Liang Lin
179
7
0
12 Dec 2024
Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments
Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments
Sangmim Song
S. Kodagoda
A. Gunatilake
Marc G. Carmichael
Karthick Thiyagarajan
Jodi Martin
LM&Ro
139
1
0
28 Oct 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOSLRM
124
5
0
18 Jul 2024
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive
  Captioners
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLMVGen
57
51
0
09 Dec 2022
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large
  Language Models
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Chan Hee Song
Jiaman Wu
Clay Washington
Brian M Sadler
Wei-Lun Chao
Yu-Chuan Su
LLMAGLM&Ro
135
418
0
08 Dec 2022
Interactive Language: Talking to Robots in Real Time
Interactive Language: Talking to Robots in Real Time
Corey Lynch
Ayzaan Wahid
Jonathan Tompson
Tianli Ding
James Betker
Robert Baruch
Travis Armstrong
Peter R. Florence
LM&Ro
91
229
0
12 Oct 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLMVLM
116
732
0
14 Sep 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
524
6,293
0
05 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
192
1,984
0
04 Apr 2022
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
Eric Jang
A. Irpan
Mohi Khansari
Daniel Kappler
F. Ebert
Corey Lynch
Sergey Levine
Chelsea Finn
LM&Ro
263
549
0
04 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
555
4,409
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
843
9,644
0
28 Jan 2022
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
83
244
0
25 Nov 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
342
4,569
0
27 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
401
1,109
0
13 Oct 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
136
799
0
24 Aug 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
97
506
0
18 May 2021
Towards General Purpose Vision Systems
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
40
53
0
01 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
459
3,893
0
11 Feb 2021
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
118
1,207
0
07 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
115
474
0
06 Jun 2019
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person
  Videos
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
Gunnar Sigurdsson
Abhinav Gupta
Cordelia Schmid
Ali Farhadi
Alahari Karteek
SLREgoV
66
171
0
25 Apr 2018
Video Captioning via Hierarchical Reinforcement Learning
Video Captioning via Hierarchical Reinforcement Learning
Xin Eric Wang
Wenhu Chen
Jiawei Wu
Yuan-fang Wang
William Yang Wang
88
229
0
29 Nov 2017
Program Induction by Rationale Generation : Learning to Solve and
  Explain Algebraic Word Problems
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Wang Ling
Dani Yogatama
Chris Dyer
Phil Blunsom
AIMat
106
735
0
11 May 2017
Video Captioning with Transferred Semantic Attributes
Video Captioning with Transferred Semantic Attributes
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
77
329
0
23 Nov 2016
1