ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09261
  4. Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
    ALM
    ELM
    LRM
    ReLM
ArXivPDFHTML

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 797 papers shown
Title
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya
Agney S Talwarr
Vatsal Gupta
Tushar Kataria
Dan Roth
Vivek Gupta
LRM
67
2
0
15 Jul 2024
SoupLM: Model Integration in Large Language and Multi-Modal Models
SoupLM: Model Integration in Large Language and Multi-Modal Models
Yue Bai
Zichen Zhang
Jiasen Lu
Yun Fu
MoMe
35
1
0
11 Jul 2024
Training on the Test Task Confounds Evaluation and Emergence
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
71
7
1
10 Jul 2024
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Catherine Tony
Nicolás E. Díaz Ferreyra
Markus Mutas
Salem Dhiff
Riccardo Scandariato
SILM
79
9
0
09 Jul 2024
Enhancing Language Model Rationality with Bi-Directional Deliberation
  Reasoning
Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning
Yadong Zhang
Shaoguang Mao
Wenshan Wu
Yan Xia
Tao Ge
Man Lan
Furu Wei
48
3
0
08 Jul 2024
Training Task Experts through Retrieval Based Distillation
Training Task Experts through Retrieval Based Distillation
Jiaxin Ge
Xueying Jia
Vijay Viswanathan
Hongyin Luo
Graham Neubig
40
3
0
07 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long
  Context and Video Understanding
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
54
5
0
06 Jul 2024
AgentInstruct: Toward Generative Teaching with Agentic Flows
AgentInstruct: Toward Generative Teaching with Agentic Flows
Arindam Mitra
Luciano Del Corro
Guoqing Zheng
Shweti Mahajan
Dany Rouhana
...
Corby Rosset
Fillipe Silva
Hamed Khanpour
Yash Lara
Ahmed Awadallah
SyDa
40
25
0
03 Jul 2024
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through
  Self-Correction in Language Models
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
Haritz Puerto
Tilek Chubakov
Xiaodan Zhu
Harish Tayyar Madabushi
Iryna Gurevych
ReLM
LRM
52
9
1
03 Jul 2024
Cost-Effective Proxy Reward Model Construction with On-Policy and Active
  Learning
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Yifang Chen
Shuohang Wang
Ziyi Yang
Hiteshi Sharma
Nikos Karampatziakis
Donghan Yu
Kevin G. Jamieson
Simon Shaolei Du
Yelong Shen
OffRL
51
4
0
02 Jul 2024
The Art of Saying No: Contextual Noncompliance in Language Models
The Art of Saying No: Contextual Noncompliance in Language Models
Faeze Brahman
Sachin Kumar
Vidhisha Balachandran
Pradeep Dasigi
Valentina Pyatkin
...
Jack Hessel
Yulia Tsvetkov
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
75
21
0
02 Jul 2024
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought:
  Probability, Memorization, and Noisy Reasoning
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning
Akshara Prabhakar
Thomas L. Griffiths
R. Thomas McCoy
LRM
42
17
0
01 Jul 2024
Collaborative Performance Prediction for Large Language Models
Collaborative Performance Prediction for Large Language Models
Qiyuan Zhang
Fuyuan Lyu
Xue Liu
Chen Ma
32
3
0
01 Jul 2024
Exploring Advanced Large Language Models with LLMsuite
Exploring Advanced Large Language Models with LLMsuite
Giorgio Roffo
LLMAG
24
0
0
01 Jul 2024
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language
  Models by Learning from Knowledge Graphs
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs
Yifei Zhang
Xintao Wang
Jiaqing Liang
Sirui Xia
Lida Chen
Yanghua Xiao
LRM
32
1
0
30 Jun 2024
LLMs-as-Instructors: Learning from Errors Toward Automating Model
  Improvement
LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement
Jiahao Ying
Mingbao Lin
Yixin Cao
Wei Tang
Bo Wang
Qianru Sun
Xuanjing Huang
Shuicheng Yan
LRM
38
8
0
29 Jun 2024
It's Morphing Time: Unleashing the Potential of Multiple LLMs via
  Multi-objective Optimization
It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization
Bingdong Li
Zixiang Di
Yanting Yang
Hong Qian
Peng Yang
Hao Hao
Ke Tang
Aimin Zhou
MoMe
19
5
0
29 Jun 2024
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert
  Prompts
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
Ruochen Wang
Sohyun An
Minhao Cheng
Tianyi Zhou
Sung Ju Hwang
Cho-Jui Hsieh
42
7
0
28 Jun 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept
  Space
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park
Maya Okawa
Andrew Lee
Ekdeep Singh Lubana
Hidenori Tanaka
62
7
0
27 Jun 2024
Aligning Teacher with Student Preferences for Tailored Training Data
  Generation
Aligning Teacher with Student Preferences for Tailored Training Data Generation
Yantao Liu
Zhao Zhang
Zijun Yao
S. Cao
Lei Hou
Juanzi Li
49
1
0
27 Jun 2024
UniGen: A Unified Framework for Textual Dataset Generation Using Large
  Language Models
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models
Siyuan Wu
Yue Huang
Chujie Gao
Dongping Chen
Qihui Zhang
...
Tianyi Zhou
Xiangliang Zhang
Jianfeng Gao
Chaowei Xiao
Lichao Sun
SyDa
38
22
0
27 Jun 2024
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White
Samuel Dooley
Manley Roberts
Arka Pal
Ben Feuer
...
W. Neiswanger
Micah Goldblum
Tom Goldstein
Willie Neiswanger
Micah Goldblum
ELM
50
7
0
27 Jun 2024
Autonomous Prompt Engineering in Large Language Models
Autonomous Prompt Engineering in Large Language Models
Daan Kepel
Konstantina Valogianni
LLMAG
48
7
0
25 Jun 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning
  Graph
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
37
8
0
25 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
62
14
0
24 Jun 2024
Trace is the New AutoDiff -- Unlocking Efficient Optimization of
  Computational Workflows
Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows
Ching-An Cheng
Allen Nie
Adith Swaminathan
AI4CE
39
12
0
23 Jun 2024
SEAM: A Stochastic Benchmark for Multi-Document Tasks
SEAM: A Stochastic Benchmark for Multi-Document Tasks
Gili Lior
Avi Caciularu
Arie Cattan
Shahar Levy
Ori Shapira
Gabriel Stanovsky
RALM
40
4
0
23 Jun 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yufei Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
64
3
0
23 Jun 2024
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex
  Models
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Xinrong Zhang
Yingfa Chen
Shengding Hu
Xu Han
Zihang Xu
Yuanwei Xu
Weilin Zhao
Maosong Sun
Zhiyuan Liu
40
10
0
22 Jun 2024
Évaluation des capacités de réponse de larges modèles de langage
  (LLM) pour des questions d'historiens
Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens
M. Chartier
Nabil Dakkoune
G. Bourgeois
Stéphane Jean
KELM
ELM
31
1
0
21 Jun 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Sachit Menon
Richard Zemel
Carl Vondrick
LRM
45
2
0
20 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Ryan Cotterell
ReLM
LRM
54
13
0
20 Jun 2024
CityGPT: Empowering Urban Spatial Cognition of Large Language Models
CityGPT: Empowering Urban Spatial Cognition of Large Language Models
Jie Feng
Yuwei Du
Tianhui Liu
Siqi Guo
Yuming Lin
Yong Li
45
13
0
20 Jun 2024
CityBench: Evaluating the Capabilities of Large Language Model as World
  Model
CityBench: Evaluating the Capabilities of Large Language Model as World Model
Jie Feng
Jun Zhang
Junbo Yan
Xin Zhang
Tianjian Ouyang
Tianhui Liu
Yuwei Du
Siqi Guo
Yong Li
ELM
56
0
0
20 Jun 2024
Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines
  via Combinatorial Optimization
Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization
Mert Esencan
Tarun Advaith Kumar
A. A. Asanjan
P. A. Lott
Masoud Mohseni
Can Unlu
Davide Venturelli
A. Ho
LRM
32
0
0
19 Jun 2024
Dual-Phase Accelerated Prompt Optimization
Dual-Phase Accelerated Prompt Optimization
Muchen Yang
Moxin Li
Yongle Li
Zijun Chen
Chongming Gao
Junqi Zhang
Yangyang Li
Fuli Feng
32
0
0
19 Jun 2024
BeHonest: Benchmarking Honesty in Large Language Models
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILM
ALM
86
3
0
19 Jun 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Jinhyuk Lee
Anthony Chen
Zhuyun Dai
Dheeru Dua
Devendra Singh Sachan
...
Jeremy R. Cole
Sebastian Riedel
Iftekhar Naim
Ming-Wei Chang
Kelvin Guu
RALM
LRM
58
30
0
19 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
76
500
0
18 Jun 2024
Breaking the Ceiling of the LLM Community by Treating Token Generation
  as a Classification for Ensembling
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Yao-Ching Yu
Chun-Chih Kuo
Ziqi Ye
Yu-Cheng Chang
Yueh-Se Li
56
9
0
18 Jun 2024
Abstraction-of-Thought Makes Language Models Better Reasoners
Abstraction-of-Thought Makes Language Models Better Reasoners
Ruixin Hong
Hongming Zhang
Xiaoman Pan
Dong Yu
Changshui Zhang
LRM
53
4
0
18 Jun 2024
Self-MoE: Towards Compositional Large Language Models with
  Self-Specialized Experts
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Junmo Kang
Leonid Karlinsky
Hongyin Luo
Zhen Wang
Jacob A. Hansen
James Glass
David D. Cox
Yikang Shen
Rogerio Feris
Alan Ritter
MoMe
MoE
42
8
0
17 Jun 2024
Prompt Design Matters for Computational Social Science Tasks but in
  Unpredictable Ways
Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways
Shubham Atreja
Joshua Ashkinaze
Lingyao Li
Julia Mendelsohn
Libby Hemphill
50
13
0
17 Jun 2024
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
  Intelligence
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
DeepSeek-AI
Qihao Zhu
Daya Guo
Zhihong Shao
Dejian Yang
...
Jiashi Li
Chenggang Zhao
Chong Ruan
Fuli Luo
Wenfeng Liang
MoE
LRM
ELM
VLM
48
157
0
17 Jun 2024
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu
Daize Dong
Xiaoye Qu
Jiacheng Ruan
Wenliang Chen
Yu Cheng
MoE
40
8
0
17 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
49
8
0
17 Jun 2024
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language
  Models
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Zhuoran Jin
Pengfei Cao
Chenhao Wang
Zhitao He
Hongbang Yuan
Jiachun Li
Yubo Chen
Kang Liu
Jun Zhao
KELM
MU
42
12
0
16 Jun 2024
A Comprehensive Survey of Scientific Large Language Models and Their
  Applications in Scientific Discovery
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
Yu Zhang
Xiusi Chen
Bowen Jin
Sheng Wang
Shuiwang Ji
Wei Wang
Jiawei Han
44
27
0
16 Jun 2024
StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language
  Model's Ability in Structure-Rich Text Understanding
StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding
Zhouhong Gu
Haoning Ye
Zeyang Zhou
Hongwei Feng
Yanghua Xiao
ELM
52
1
0
15 Jun 2024
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
TJ Dunham
Henry Syahputra
32
1
0
15 Jun 2024
Previous
123...678...141516
Next