Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.16919
Cited By
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
27 May 2024
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models"
21 / 71 papers shown
Title
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
MLLM
LRM
66
632
0
08 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
577
12,840
0
27 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
365
4,406
0
30 Jan 2023
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen
Xueguang Ma
Xinyi Wang
William W. Cohen
ReLM
ReCod
LRM
127
774
0
22 Nov 2022
Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLM
LRM
98
603
0
07 Oct 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
401
4,077
0
24 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
77
169
0
30 Apr 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
232
3,458
0
29 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
68
413
0
07 Apr 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
442
3,486
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
533
9,009
0
28 Jan 2022
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
548
28,659
0
26 Feb 2021
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISeg
VLM
74
1,352
0
08 Aug 2019
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
35
1,174
0
18 Apr 2019
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
Runtao Liu
Chenxi Liu
Yutong Bai
Alan Yuille
NAI
ObjD
44
123
0
03 Jan 2019
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
257
2,346
0
20 Dec 2016
A Diagram Is Worth A Dozen Images
Aniruddha Kembhavi
M. Salvato
Eric Kolve
Minjoon Seo
Hannaneh Hajishirzi
Ali Farhadi
3DV
19
456
0
24 Mar 2016
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
207
517
0
26 Jan 2016
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
372
61,900
0
04 Jun 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
149
2,033
0
19 May 2015
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
197
4,451
0
20 Nov 2014
Previous
1
2