ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.00738
  4. Cited By
Can Foundation Models Watch, Talk and Guide You Step by Step to Make a
  Cake?

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

1 November 2023
Yuwei Bao
Keunwoo Peter Yu
Yichi Zhang
Shane Storks
Itamar Bar-Yossef
Alexander De La Iglesia
Megan Su
Xiao Lin Zheng
Joyce Chai
ArXivPDFHTML

Papers citing "Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?"

10 / 10 papers shown
Title
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
Keunwoo Peter Yu
Joyce Chai
MLLM
VLM
12
0
0
16 May 2025
"Is This It?": Towards Ecologically Valid Benchmarks for Situated
  Collaboration
"Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration
D. Bohus
Sean Andrist
Yuwei Bao
Eric Horvitz
Ann Paradiso
35
0
0
30 Aug 2024
AI-Powered Immersive Assistance for Interactive Task Execution in
  Industrial Environments
AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments
Tomislav Duricic
Peter Müllner
Nicole Weidinger
Neven Elsayed
Dominik Kowald
Eduardo E. Veas
33
1
0
12 Jul 2024
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning
  of Large Language Models
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models
Jianben He
Xingbo Wang
Shiyi Liu
Guande Wu
Claudio Silva
Huamin Qu
LRM
37
2
0
06 Jun 2024
SIGMA: An Open-Source Interactive System for Mixed-Reality Task
  Assistance Research
SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
D. Bohus
Sean Andrist
Nick Saw
Ann Paradiso
Ishani Chakraborty
Mahdi Rad
38
9
0
16 May 2024
Vision-Language Models as Success Detectors
Vision-Language Models as Success Detectors
Yuqing Du
Ksenia Konyushkova
Misha Denil
A. Raju
Jessica Landon
Felix Hill
Nando de Freitas
Serkan Cabi
MLLM
LRM
91
77
0
13 Mar 2023
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
Ivan Kapelyukh
Vitalis Vosylius
Edward Johns
LM&Ro
DiffM
113
146
0
05 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
392
4,154
0
28 Jan 2022
TEACh: Task-driven Embodied Agents that Chat
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gokhan Tur
Dilek Z. Hakkani-Tür
LM&Ro
169
180
0
01 Oct 2021
MindCraft: Theory of Mind Modeling for Situated Dialogue in
  Collaborative Tasks
MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks
Cristian-Paul Bara
Sky CH-Wang
J. Chai
67
61
0
13 Sep 2021
1