Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.09261
Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"
50 / 798 papers shown
Title
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
TJ Dunham
Henry Syahputra
37
1
0
15 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
51
10
0
14 Jun 2024
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Hamish Ivison
Yizhong Wang
Jiacheng Liu
Zeqiu Wu
Valentina Pyatkin
Nathan Lambert
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
46
41
0
13 Jun 2024
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
Cheng-Kuang Wu
Zhi Rui Tam
Chieh-Yen Lin
Yun-Nung Chen
Hung-yi Lee
LLMAG
42
7
0
13 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
55
11
0
12 Jun 2024
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAG
OOD
AI4CE
46
33
0
11 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
45
2
0
11 Jun 2024
Teaching Language Models to Self-Improve by Learning from Language Feedback
Chi Hu
Yimin Hu
Hang Cao
Tong Xiao
Jingbo Zhu
LRM
VLM
35
4
0
11 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
48
1
0
08 Jun 2024
A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
Megh Thakkar
Quentin Fournier
Matthew D Riemer
Pin-Yu Chen
Amal Zouaq
Payel Das
Sarath Chandar
ALM
44
8
0
07 Jun 2024
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Ling Yang
Zhaochen Yu
Tianjun Zhang
Shiyi Cao
Minkai Xu
Wentao Zhang
Joseph E. Gonzalez
Bin Cui
LLMAG
LM&Ro
LRM
KELM
45
35
0
06 Jun 2024
Uncovering Limitations of Large Language Models in Information Seeking from Tables
Chaoxu Pang
Yixuan Cao
Chunhao Yang
Ping Luo
RALM
LMTD
41
3
0
06 Jun 2024
Evaluating the World Model Implicit in a Generative Model
Keyon Vafa
Justin Y. Chen
Jon M. Kleinberg
S. Mullainathan
Ashesh Rambachan
90
29
0
06 Jun 2024
Xmodel-LM Technical Report
Yichuan Wang
Yang Liu
Yu Yan
Qun Wang
Xucheng Huang
Ling Jiang
OSLM
ALM
35
1
0
05 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRM
ELM
64
308
0
03 Jun 2024
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Quandong Wang
Yuxuan Yuan
Xiaoyu Yang
Ruike Zhang
Kang Zhao
Wei Liu
Jian Luan
Daniel Povey
Bin Wang
49
0
0
03 Jun 2024
Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
Keyon Vafa
Ashesh Rambachan
S. Mullainathan
ELM
ALM
24
13
0
03 Jun 2024
EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs
Zixuan Dong
Baoyun Peng
Yufei Wang
Jia Fu
Xiaodong Wang
Yongxue Shan
Xin Zhou
42
1
0
03 Jun 2024
Demonstration Augmentation for Zero-shot In-context Learning
Yi Su
Yunpeng Tai
Yixin Ji
Juntao Li
Bowen Yan
Min Zhang
RALM
46
6
0
03 Jun 2024
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Jinjie Ni
Fuzhao Xue
Xiang Yue
Yuntian Deng
Mahir Shah
Kabir Jain
Graham Neubig
Yang You
ELM
32
38
0
03 Jun 2024
A Survey of Useful LLM Evaluation
Ji-Lun Peng
Sijia Cheng
Egil Diau
Yung-Yu Shih
Po-Heng Chen
Yen-Ting Lin
Yun-Nung Chen
LLMAG
ELM
34
12
0
03 Jun 2024
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
Wenjie Wang
Moxin Li
Junrong Guo
Yang Zhang
Fuli Feng
ELM
LRM
42
15
0
02 Jun 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
27
66
0
30 May 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya Zhang
Yu Wang
Yanfeng Wang
29
3
0
30 May 2024
Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation
Chengwei Dai
Kun Li
Wei Zhou
Song Hu
LRM
52
3
0
30 May 2024
Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation
Chengwei Dai
Kun Li
Wei Zhou
Song Hu
LRM
43
5
0
30 May 2024
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data
Zifan Song
Yudong Wang
Wenwei Zhang
Kuikun Liu
Chengqi Lyu
...
Qipeng Guo
Hang Yan
Dahua Lin
Kai-xiang Chen
Cairong Zhao
SyDa
46
2
0
29 May 2024
Towards Dialogues for Joint Human-AI Reasoning and Value Alignment
Elfia Bezou-Vrakatseli
O. Cocarascu
Sanjay Modgil
30
0
0
28 May 2024
Self-Guiding Exploration for Combinatorial Problems
Zangir Iklassov
Yali Du
Farkhad Akimov
Martin Takáč
LRM
32
2
0
28 May 2024
Efficient multi-prompt evaluation of LLMs
Felipe Maia Polo
Ronald Xu
Lucas Weber
Mírian Silva
Onkar Bhardwaj
Leshem Choshen
Allysson Flavio Melo de Oliveira
Yuekai Sun
Mikhail Yurochkin
45
20
0
27 May 2024
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
Chengxing Jia
Pengyuan Wang
Ziniu Li
Yi-Chen Li
Zhilong Zhang
Nan Tang
Yang Yu
OffRL
42
1
0
27 May 2024
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubić
Federico Soldá
Aurelio Sulser
Davide Scaramuzza
LRM
BDL
52
5
0
26 May 2024
STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making
Chuanhao Li
Runhan Yang
Tiankai Li
Milad Bafarassat
Kourosh Sharifi
Dirk Bergemann
Zhuoran Yang
LLMAG
39
5
0
25 May 2024
Learning to Reason via Program Generation, Emulation, and Search
Nathaniel Weir
Muhammad Khalifa
Linlu Qiu
Orion Weller
Peter Clark
SyDa
ReLM
LRM
90
5
0
25 May 2024
CulturePark: Boosting Cross-cultural Understanding in Large Language Models
Cheng-rong Li
Damien Teney
Linyi Yang
Qingsong Wen
Xing Xie
Jindong Wang
46
4
0
24 May 2024
Instruction Tuning With Loss Over Instructions
Zhengyan Shi
Adam X. Yang
Bin Wu
Laurence Aitchison
Emine Yilmaz
Aldo Lipani
ALM
24
20
0
23 May 2024
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction
Tingchen Fu
Deng Cai
Lemao Liu
Shuming Shi
Rui Yan
MoMe
62
13
0
22 May 2024
360Zhinao Technical Report
360Zhinao Team
40
0
0
22 May 2024
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Zhiyuan Zeng
Xiaonan Li
...
Qinyuan Cheng
Ding Wang
Xiaofeng Mou
Xipeng Qiu
XuanJing Huang
LRM
46
4
0
21 May 2024
Towards Modular LLMs by Building and Reusing a Library of LoRAs
O. Ostapenko
Zhan Su
E. Ponti
Laurent Charlin
Nicolas Le Roux
Matheus Pereira
Lucas Caccia
Alessandro Sordoni
MoMe
44
31
0
18 May 2024
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities
Hao Zhou
Chengming Hu
Ye Yuan
Yufei Cui
Yili Jin
...
Di Wu
Xue Liu
Charlie Zhang
Xianbin Wang
Jiangchuan Liu
35
59
0
17 May 2024
METAREFLECTION: Learning Instructions for Language Agents using Past Reflections
Priyanshu Gupta
Shashank Kirtania
Ananya Singha
Sumit Gulwani
Arjun Radhakrishna
Sherry Shi
Gustavo Soares
LLMAG
40
4
0
13 May 2024
COBias and Debias: Balancing Class Accuracies for Language Models in Inference Time via Nonlinear Integer Programming
Ruixi Lin
Yang You
35
1
0
13 May 2024
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao
Yi Su
Pinzheng Wang
Jing Ye
Wen Xie
...
Wenliang Chen
Guohong Fu
Guodong Zhou
Qiaoming Zhu
Min Zhang
MQ
35
0
0
09 May 2024
ADELIE: Aligning Large Language Models on Information Extraction
Y. Qi
Hao Peng
Xiaozhi Wang
Bin Xu
Lei Hou
Juanzi Li
44
7
0
08 May 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning
Kaya Stechly
Karthik Valmeekam
Subbarao Kambhampati
LRM
LM&Ro
75
40
0
08 May 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek-AI
Aixin Liu
Bei Feng
Bin Wang
Bingxuan Wang
...
Zhuoshu Li
Zihan Wang
Zihui Gu
Zilin Li
Ziwei Xie
MoE
49
393
0
07 May 2024
Long Context Alignment with Short Instructions and Synthesized Positions
Wenhao Wu
Yizhong Wang
Yao Fu
Xiang Yue
Dawei Zhu
Sujian Li
SyDa
46
18
0
07 May 2024
MAmmoTH2: Scaling Instructions from the Web
Xiang Yue
Tuney Zheng
Ge Zhang
Wenhu Chen
ALM
LRM
57
87
0
06 May 2024
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Guanhua Zhang
Moritz Hardt
42
8
0
02 May 2024
Previous
1
2
3
...
7
8
9
...
14
15
16
Next