ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09261
  4. Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
    ALM
    ELM
    LRM
    ReLM
ArXivPDFHTML

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 797 papers shown
Title
Large Language Models as Optimizers
Large Language Models as Optimizers
Chengrun Yang
Xuezhi Wang
Yifeng Lu
Hanxiao Liu
Quoc V. Le
Denny Zhou
Xinyun Chen
ODL
43
376
0
07 Sep 2023
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models
Guijin Son
Hanwool Albert Lee
Suwan Kim
Huiseo Kim
Jaecheol Lee
Je Won Yeom
Jihyu Jung
Jung Woo Kim
Songseong Kim
RALM
ELM
26
20
0
06 Sep 2023
When Do Program-of-Thoughts Work for Reasoning?
When Do Program-of-Thoughts Work for Reasoning?
Zhen Bi
Ningyu Zhang
Yinuo Jiang
Shumin Deng
Guozhou Zheng
Huajun Chen
LRM
32
20
0
29 Aug 2023
Empowering Cross-lingual Abilities of Instruction-tuned Large Language
  Models by Translation-following demonstrations
Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations
Leonardo Ranaldi
Giulia Pucci
André Freitas
30
33
0
27 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
54
14
0
23 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Fei Wu
Guoyin Wang
LM&MA
24
538
0
21 Aug 2023
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language
  Models
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Bilgehan Sel
Ahmad S. Al-Tawaha
Vanshaj Khattar
R. Jia
Ming Jin
LM&Ro
LRM
29
62
0
20 Aug 2023
PACE: Improving Prompt with Actor-Critic Editing for Large Language
  Model
PACE: Improving Prompt with Actor-Critic Editing for Large Language Model
Yihong Dong
Kangcheng Luo
Xue Jiang
Zhi Jin
Ge Li
LRM
KELM
36
9
0
19 Aug 2023
Red-Teaming Large Language Models using Chain of Utterances for
  Safety-Alignment
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
19
127
0
18 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via
  Parameter-Efficient Module Operation
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
M. Zhang
KELM
MU
33
26
0
16 Aug 2023
CausalLM is not optimal for in-context learning
CausalLM is not optimal for in-context learning
Nan Ding
Tomer Levinboim
Jialin Wu
Sebastian Goodman
Radu Soricut
24
23
0
14 Aug 2023
In-Context Alignment: Chat with Vanilla Language Models Before
  Fine-Tuning
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
Xiaochuang Han
22
19
0
08 Aug 2023
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu
Xukun Liu
Hua Shen
Zeyu Han
Yuhan Li
Murong Yue
Zhi-Ping Peng
Yuchen Liu
Ziyu Yao
Dongkuan Xu
LLMAG
30
19
0
08 Aug 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
22
69
0
07 Aug 2023
Automatically Correcting Large Language Models: Surveying the landscape
  of diverse self-correction strategies
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Liangming Pan
Michael Stephen Saxon
Wenda Xu
Deepak Nathani
Xinyi Wang
William Yang Wang
KELM
LRM
47
201
0
06 Aug 2023
Do LLMs Possess a Personality? Making the MBTI Test an Amazing
  Evaluation for Large Language Models
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models
Keyu Pan
Yawen Zeng
LLMAG
23
41
0
30 Jul 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
39
198
0
24 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
40
132
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
40
98
0
20 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
30
86
0
20 Jul 2023
Multi-Method Self-Training: Improving Code Generation With Text, And
  Vice Versa
Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa
Shriyash Upadhyay
Etan Ginsberg
SyDa
LRM
19
0
0
20 Jul 2023
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in
  Language Model Prompting
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
Rylan Schaeffer
Kateryna Pistunova
Samarth Khanna
Sarthak Consul
Oluwasanmi Koyejo
ReLM
LRM
39
10
0
20 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
108
11,007
0
18 Jul 2023
Large Language Models Perform Diagnostic Reasoning
Large Language Models Perform Diagnostic Reasoning
Cheng-Kuang Wu
Wei-Lin Chen
Hsin-Hsi Chen
ReLM
ELM
LM&MA
LRM
18
17
0
18 Jul 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
AlpaGasus: Training A Better Alpaca with Fewer Data
Lichang Chen
Shiyang Li
Jun Yan
Hai Wang
Kalpa Gunaratna
...
Zheng Tang
Vijay Srinivasan
Dinesh Manocha
Heng-Chiao Huang
Hongxia Jin
ALM
46
0
0
17 Jul 2023
Do Emergent Abilities Exist in Quantized Large Language Models: An
  Empirical Study
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Peiyu Liu
Zikang Liu
Ze-Feng Gao
Dawei Gao
Wayne Xin Zhao
Yaliang Li
Bolin Ding
Ji-Rong Wen
MQ
LRM
30
31
0
16 Jul 2023
Large Language Models Understand and Can be Enhanced by Emotional
  Stimuli
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
Cheng-rong Li
Jindong Wang
Yixuan Zhang
Kaijie Zhu
Wenxin Hou
Jianxun Lian
Fang Luo
Qiang Yang
Xingxu Xie
LRM
80
120
0
14 Jul 2023
Large Language Models as General Pattern Machines
Large Language Models as General Pattern Machines
Suvir Mirchandani
F. Xia
Peter R. Florence
Brian Ichter
Danny Driess
Montse Gonzalez Arenas
Kanishka Rao
Dorsa Sadigh
Andy Zeng
LLMAG
57
184
0
10 Jul 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN
  Fine-Tuning
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALM
LRM
30
17
0
05 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model
  Planners
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
64
219
0
04 Jul 2023
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think"
  Step-by-Step
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step
Liunian Harold Li
Jack Hessel
Youngjae Yu
Xiang Ren
Kai-Wei Chang
Yejin Choi
LRM
AI4CE
ReLM
22
129
0
24 Jun 2023
Joint Prompt Optimization of Stacked LLMs using Variational Inference
Joint Prompt Optimization of Stacked LLMs using Variational Inference
Alessandro Sordoni
Xingdi Yuan
Marc-Alexandre Côté
Matheus Pereira
Adam Trischler
Ziang Xiao
Arian Hosseini
Friederike Niedtner
Nicolas Le Roux
33
27
0
21 Jun 2023
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
...
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
ELM
ALM
38
66
0
15 Jun 2023
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge
  Evaluation
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Jianchen Wang
...
Zili Wang
Shusen Wang
Weiguo Zheng
Hongwei Feng
Yanghua Xiao
ALM
ELM
30
58
0
09 Jun 2023
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large
  Language Models
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
Yew Ken Chia
Pengfei Hong
Lidong Bing
Soujanya Poria
ELM
25
63
0
07 Jun 2023
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open
  Resources
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Yizhong Wang
Hamish Ivison
Pradeep Dasigi
Jack Hessel
Tushar Khot
...
David Wadden
Kelsey MacMillan
Noah A. Smith
Iz Beltagy
Hannaneh Hajishirzi
ALM
ELM
13
369
0
07 Jun 2023
Certified Deductive Reasoning with Language Models
Certified Deductive Reasoning with Language Models
Gabriel Poesia
Kanishk Gandhi
E. Zelikman
Noah D. Goodman
ELM
ReLM
LRM
32
0
0
06 Jun 2023
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Subhabrata Mukherjee
Arindam Mitra
Ganesh Jawahar
Sahaj Agarwal
Hamid Palangi
Ahmed Hassan Awadallah
ELM
ALM
LRM
38
262
0
05 Jun 2023
Multilingual Conceptual Coverage in Text-to-Image Models
Multilingual Conceptual Coverage in Text-to-Image Models
Michael Stephen Saxon
William Yang Wang
EGVM
24
8
0
02 Jun 2023
Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation
Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation
Adithya V Ganesan
Yash Kumar Lal
August Håkan Nilsson
H. A. Schwartz
27
22
0
01 Jun 2023
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
Shalev Lifshitz
Keiran Paster
Harris Chan
Jimmy Ba
Sheila A. McIlraith
LM&Ro
24
67
0
01 Jun 2023
Grammar Prompting for Domain-Specific Language Generation with Large
  Language Models
Grammar Prompting for Domain-Specific Language Generation with Large Language Models
Bailin Wang
Zi Wang
Xuezhi Wang
Yuan Cao
Rif A. Saurous
Yoon Kim
ReLM
LRM
35
52
0
30 May 2023
Strategic Reasoning with Language Models
Strategic Reasoning with Language Models
Kanishk Gandhi
Dorsa Sadigh
Noah D. Goodman
LM&Ro
LRM
42
36
0
30 May 2023
Encouraging Divergent Thinking in Large Language Models through
  Multi-Agent Debate
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Tian Liang
Zhiwei He
Wenxiang Jiao
Xing Wang
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
LLMAG
LRM
37
401
0
30 May 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
  Datasets
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq R. Joty
J. Huang
LM&MA
ELM
ALM
46
179
0
29 May 2023
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language
  Models' Reasoning Performance
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu
Litu Ou
Mingyu Chen
Yuhao Wan
Hao-Chun Peng
Tushar Khot
LLMAG
ELM
LRM
ReLM
33
109
0
26 May 2023
Large Language Models as Tool Makers
Large Language Models as Tool Makers
Tianle Cai
Xuezhi Wang
Tengyu Ma
Xinyun Chen
Denny Zhou
LLMAG
32
186
0
26 May 2023
Passive learning of active causal strategies in agents and language
  models
Passive learning of active causal strategies in agents and language models
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Ishita Dasgupta
A. Nam
Jane X. Wang
29
15
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
30
218
0
24 May 2023
Self-ICL: Zero-Shot In-Context Learning with Self-Generated
  Demonstrations
Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations
Wei-Lin Chen
Cheng-Kuang Wu
Yun-Nung Chen
Hsin-Hsi Chen
21
27
0
24 May 2023
Previous
123...13141516
Next