ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00598
  4. Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
v1v2 (latest)

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
    ReLMLRM
ArXiv (abs)PDFHTML

Papers citing "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"

38 / 438 papers shown
Title
A Case for Business Process-Specific Foundation Models
A Case for Business Process-Specific Foundation Models
Sadhana Kumaravel
Praveen Venkateswaran
Vatche Isahagian
Vinod Muthusamy
AI4CE
75
9
0
26 Oct 2022
IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from
  Egocentric Videos and Text
IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Seungwhan Moon
Andrea Madotto
Zhaojiang Lin
Alireza Dirafzoon
Aparajita Saraf
Amy Bearman
Babak Damavandi
VLM
76
37
0
26 Oct 2022
Instruction-Following Agents with Multimodal Transformer
Instruction-Following Agents with Multimodal Transformer
Hao Liu
Lisa Lee
Kimin Lee
Pieter Abbeel
LM&Ro
141
11
0
24 Oct 2022
Composing Ensembles of Pre-trained Models via Iterative Consensus
Composing Ensembles of Pre-trained Models via Iterative Consensus
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
Igor Mordatch
MoMe
78
25
0
20 Oct 2022
Communication breakdown: On the low mutual intelligibility between human
  and neural captioning
Communication breakdown: On the low mutual intelligibility between human and neural captioning
Roberto Dessì
Eleonora Gualdoni
Francesca Franzon
Gemma Boleda
Marco Baroni
VLM
120
6
0
20 Oct 2022
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models
  with Zero Training
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
A. M. H. Tiong
Junnan Li
Boyang Albert Li
Silvio Savarese
Guosheng Lin
MLLM
133
110
0
17 Oct 2022
Visual Classification via Description from Large Language Models
Visual Classification via Description from Large Language Models
Sachit Menon
Carl Vondrick
VLM
139
303
0
13 Oct 2022
Retrospectives on the Embodied AI Workshop
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
137
51
0
13 Oct 2022
Visual Language Maps for Robot Navigation
Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
LM&Ro
345
372
0
11 Oct 2022
Using Both Demonstrations and Language Instructions to Efficiently Learn
  Robotic Tasks
Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks
Albert Yu
Raymond J. Mooney
LM&Ro
87
20
0
10 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
179
356
0
06 Oct 2022
Grounding Language with Visual Affordances over Unstructured Data
Grounding Language with Visual Affordances over Unstructured Data
Oier Mees
Jessica Borja-Diaz
Wolfram Burgard
LM&Ro
200
114
0
04 Oct 2022
Enhancing Interpretability and Interactivity in Robot Manipulation: A
  Neurosymbolic Approach
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
Georgios Tziafas
Hamidreza Kasaei
LM&Ro
106
3
0
03 Oct 2022
Linearly Mapping from Image to Text Space
Linearly Mapping from Image to Text Space
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
253
119
0
30 Sep 2022
ProgPrompt: Generating Situated Robot Task Plans using Large Language
  Models
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
Dieter Fox
Jesse Thomason
Animesh Garg
LM&RoLLMAG
263
663
0
22 Sep 2022
Learning Model Predictive Controllers with Real-Time Attention for
  Real-World Navigation
Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
Xuesu Xiao
Tingnan Zhang
K. Choromanski
Edward J. Lee
Anthony G. Francis
...
Leila Takayama
Roy Frostig
Jie Tan
Carolina Parada
Vikas Sindhwani
170
55
0
22 Sep 2022
Open-vocabulary Queryable Scene Representations for Real World Planning
Open-vocabulary Queryable Scene Representations for Real World Planning
Boyuan Chen
F. Xia
Brian Ichter
Kanishka Rao
K. Gopalakrishnan
Michael S. Ryoo
Austin Stone
Daniel Kappler
LM&Ro
228
187
0
20 Sep 2022
Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks
  from The New Yorker Caption Contest
Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest
Jack Hessel
Ana Marasović
Jena D. Hwang
Lillian Lee
Jeff Da
Rowan Zellers
Robert Mankoff
Yejin Choi
VLM
132
93
0
13 Sep 2022
Leveraging Large (Visual) Language Models for Robot 3D Scene
  Understanding
Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding
William Chen
Siyi Hu
Rajat Talak
Luca Carlone
LM&Ro
52
0
0
12 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
313
502
0
12 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
116
90
0
07 Sep 2022
Reconstructing Action-Conditioned Human-Object Interactions Using
  Commonsense Knowledge Priors
Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors
Xi Wang
Gengyan Li
Yen-Ling Kuo
Muhammed Kocabas
Emre Aksan
Otmar Hilliges
133
30
0
06 Sep 2022
Semantic Abstraction: Open-World 3D Scene Understanding from 2D
  Vision-Language Models
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models
Huy Ha
Shuran Song
LM&RoVLM
116
106
0
23 Jul 2022
Language Model Cascades
Language Model Cascades
David Dohan
Winnie Xu
Aitor Lewkowycz
Jacob Austin
David Bieber
...
Henryk Michalewski
Rif A. Saurous
Jascha Narain Sohl-Dickstein
Kevin Patrick Murphy
Charles Sutton
ReLMLRM
132
102
0
21 Jul 2022
Inner Monologue: Embodied Reasoning through Planning with Language
  Models
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang
F. Xia
Ted Xiao
Harris Chan
Jacky Liang
...
Tomas Jackson
Linda Luu
Sergey Levine
Karol Hausman
Brian Ichter
LLMAGLM&RoLRM
214
929
0
12 Jul 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
  Vision, and Action
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
290
473
0
10 Jul 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
  Knowledge
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
173
389
0
17 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
169
240
0
16 Jun 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELMReLMLRM
326
2,531
0
15 Jun 2022
Visual Clues: Bridging Vision and Language Foundations for Image
  Paragraph Captioning
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Yujia Xie
Luowei Zhou
Xiyang Dai
Lu Yuan
Nguyen Bach
Ce Liu
Michael Zeng
VLMMLLM
84
28
0
03 Jun 2022
Language and Culture Internalisation for Human-Like Autotelic AI
Language and Culture Internalisation for Human-Like Autotelic AI
Cédric Colas
Tristan Karch
Clément Moulin-Frier
Pierre-Yves Oudeyer
LM&Ro
102
28
0
02 Jun 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
180
565
0
27 May 2022
Can Foundation Models Help Us Achieve Perfect Secrecy?
Can Foundation Models Help Us Achieve Perfect Secrecy?
Simran Arora
Christopher Ré
FedML
92
9
0
27 May 2022
Language Models with Image Descriptors are Strong Few-Shot
  Video-Language Learners
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Pengfei Yu
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLMVLM
263
142
0
22 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLMCLIPOffRL
368
1,315
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
461
3,631
0
29 Apr 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIPVLM
127
17
0
27 Mar 2022
SemSup: Semantic Supervision for Simple and Scalable Zero-shot
  Generalization
SemSup: Semantic Supervision for Simple and Scalable Zero-shot Generalization
Austin W. Hanjie
Ameet Deshpande
Karthik Narasimhan
VLM
87
2
0
26 Feb 2022
Previous
123456789