ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09261
  4. Cited By
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

17 October 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
Hyung Won Chung
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
    ALM
    ELM
    LRM
    ReLM
ArXivPDFHTML

Papers citing "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"

50 / 797 papers shown
Title
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
Zongzhen Yang
Binhang Qi
Hailong Sun
Wenrui Long
Ruobing Zhao
Xiang Gao
MoMe
48
0
0
26 Feb 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura
Takuya Akiba
Kazuki Fujii
Yusuke Oda
Rio Yokota
Jun Suzuki
MoMe
MoE
94
1
0
26 Feb 2025
Automatic Prompt Optimization via Heuristic Search: A Survey
Automatic Prompt Optimization via Heuristic Search: A Survey
Wendi Cui
Jiaxin Zhang
ZeLin Li
Hao Sun
Damien Lopez
Kamalika Das
Bradley Malin
Sricharan Kumar
34
1
0
26 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
89
3
0
26 Feb 2025
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Jianhao Yan
Yun Luo
Yue Zhang
LLMAG
62
1
0
25 Feb 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
87
1
0
25 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
71
0
0
24 Feb 2025
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
46
0
0
24 Feb 2025
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
91
3
0
24 Feb 2025
LightThinker: Thinking Step-by-Step Compression
LightThinker: Thinking Step-by-Step Compression
Jintian Zhang
Yuqi Zhu
Mengshu Sun
Yujie Luo
Shuofei Qiao
Lun Du
Da Zheng
H. Chen
N. Zhang
LRM
LLMAG
49
10
0
24 Feb 2025
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models
Qin Zhu
Fei Huang
Runyu Peng
K. Lu
Bowen Yu
Qinyuan Cheng
Xipeng Qiu
Xuanjing Huang
Junyang Lin
ReLM
ELM
LRM
51
1
0
24 Feb 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
Hao Li
Tao Xie
Baishakhi Ray
51
3
0
23 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
145
1
0
22 Feb 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Z. Chen
Mingxiao Li
Shangsong Liang
Z. Ren
V. Honavar
95
5
0
21 Feb 2025
Forecasting Frontier Language Model Agent Capabilities
Forecasting Frontier Language Model Agent Capabilities
Govind Pimpale
Axel Højmark
Jérémy Scheurer
Marius Hobbhahn
LLMAG
ELM
46
1
0
21 Feb 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Zehan Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
38
1
0
20 Feb 2025
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong
Junyi Li
Jinhao Jiang
Mingyu Xu
Wayne Xin Zhao
Bin Wang
Xin Wu
VLM
213
3
0
20 Feb 2025
Improve LLM-as-a-Judge Ability as a General Ability
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu
Shaoning Sun
Xiaohui Hu
Jiaxu Yan
Kaidong Yu
Xuelong Li
ELM
82
2
0
17 Feb 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Heng Chang
Jiaheng Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
80
1
0
17 Feb 2025
Atom of Thoughts for Markov LLM Test-Time Scaling
Atom of Thoughts for Markov LLM Test-Time Scaling
Fengwei Teng
Zhaoyang Yu
Quan Shi
Jiayi Zhang
Chenglin Wu
Yuyu Luo
MU
LRM
54
13
0
17 Feb 2025
System Message Generation for User Preferences using Open-Source Models
System Message Generation for User Preferences using Open-Source Models
Minbyul Jeong
Jungho Cho
Minsoo Khang
Dawoon Jung
Teakgyu Hong
41
0
0
17 Feb 2025
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Jafar Isbarov
Arofat Akhundjanova
Mammad Hajili
Kavsar Huseynova
Dmitry Gaynullin
...
Amina Alisheva
Aizirek Turdubaeva
Abdullatif Köksal
Samir Rustamov
Duygu Ataman
ELM
45
0
0
16 Feb 2025
Large Language Diffusion Models
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
112
15
0
14 Feb 2025
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Anna Arias-Duart
Pablo A. Martin-Torres
Daniel Hinjos
Pablo Bernabeu Perez
Lucia Urcelay-Ganzabal
Marta Gonzalez-Mallo
Ashwin Kumar Gururajan
Enrique Lopez-Cuena
Sergio Álvarez Napagao
Dario Garcia-Gasulla
LM&MA
ELM
105
1
0
10 Feb 2025
FRAMES: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy
FRAMES: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy
Xuemiao Zhang
Feiyu Duan
Liangyu Xu
Yongwei Zhou
Sirui Wang
Rongxiang Weng
J. Wang
Xunliang Cai
62
0
0
08 Feb 2025
Self-Supervised Prompt Optimization
Self-Supervised Prompt Optimization
Jinyu Xiang
Jiayi Zhang
Zhaoyang Yu
Fengwei Teng
Jinhao Tu
Xinbing Liang
Sirui Hong
Chenglin Wu
Yuyu Luo
OffRL
LRM
74
6
0
07 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
117
3
0
06 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
177
3
0
04 Feb 2025
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning
Moses Ananta
Muhammad Farid Adilazuarda
Zayd Muhammad Kawakibi Zuhri
Ayu Purwarianti
Alham Fikri Aji
MQ
57
0
0
03 Feb 2025
TableMaster: A Recipe to Advance Table Understanding with Language Models
TableMaster: A Recipe to Advance Table Understanding with Language Models
Lang Cao
Hanbing Liu
LMTD
RALM
217
0
1
31 Jan 2025
Ensembles of Low-Rank Expert Adapters
Ensembles of Low-Rank Expert Adapters
Yinghao Li
Vianne Gao
Chao Zhang
MohamadAli Torkamani
67
0
0
31 Jan 2025
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Siyuan Lu
Yaliang Li
Ji-Rong Wen
LRM
56
13
0
28 Jan 2025
LCTG Bench: LLM Controlled Text Generation Benchmark
Kemal Kurniawan
Masato Mita
Peinan Zhang
S. Sasaki
Ryosuke Ishigami
Naoaki Okazaki
55
0
0
28 Jan 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
143
0
0
27 Jan 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
53
5
0
21 Jan 2025
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
Qirun Dai
Dylan Zhang
Jiaqi W. Ma
Hao Peng
TDI
55
1
0
21 Jan 2025
TAPO: Task-Referenced Adaptation for Prompt Optimization
TAPO: Task-Referenced Adaptation for Prompt Optimization
Wenxin Luo
Luu Anh Tuan
Xiaopeng Li
Weibo Zhou
Pengyue Jia
Xiangyu Zhao
47
0
0
12 Jan 2025
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Lester James Validad Miranda
Yizhong Wang
Yanai Elazar
Sachin Kumar
Valentina Pyatkin
Faeze Brahman
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
47
8
0
08 Jan 2025
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
43
0
0
31 Dec 2024
A Silver Bullet or a Compromise for Full Attention? A Comprehensive
  Study of Gist Token-based Context Compression
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression
Chenlong Deng
Zhisong Zhang
Kelong Mao
Shuaiyi Li
Xinting Huang
Dong Yu
Zhicheng Dou
38
1
0
23 Dec 2024
Boosting LLM via Learning from Data Iteratively and Selectively
Boosting LLM via Learning from Data Iteratively and Selectively
Qi Jia
Siyu Ren
Ziheng Qin
Fuzhao Xue
Jinjie Ni
Yang You
33
0
0
23 Dec 2024
Revisiting In-Context Learning with Long Context Language Models
Revisiting In-Context Learning with Long Context Language Models
Jinheon Baek
Sun Jae Lee
Prakhar Gupta
Geunseob
Oh
Siddharth Dalmia
182
1
0
22 Dec 2024
NILE: Internal Consistency Alignment in Large Language Models
NILE: Internal Consistency Alignment in Large Language Models
Minda Hu
Qiyuan Zhang
Yufei Wang
Bowei He
Hongru Wang
Jingyan Zhou
Liangyou Li
Yasheng Wang
Chen-li Ma
Irwin King
86
0
0
21 Dec 2024
Formal Mathematical Reasoning: A New Frontier in AI
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRM
AI4CE
82
21
0
20 Dec 2024
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative
  Querying
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying
Federico Castagna
I. Sassoon
Simon Parsons
LRM
85
0
0
19 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
88
6
0
17 Dec 2024
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Siyuan Wang
Dianyi Wang
Chengxing Zhou
Zejun Li
Zhihao Fan
Xuanjing Huang
Zhongyu Wei
VLM
185
0
0
17 Dec 2024
C3oT: Generating Shorter Chain-of-Thought without Compromising
  Effectiveness
C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
Yu Kang
Xianghui Sun
Liangyu Chen
Wei Zou
LRM
80
20
0
16 Dec 2024
Codenames as a Benchmark for Large Language Models
Codenames as a Benchmark for Large Language Models
Matthew Stephenson
Matthew Sidji
Benoît Ronval
LLMAG
LRM
ELM
108
1
0
16 Dec 2024
Predictable Emergent Abilities of LLMs: Proxy Tasks Are All You Need
Predictable Emergent Abilities of LLMs: Proxy Tasks Are All You Need
Bo Zhang
Yan Yan
Boxiang Yang
Yifei Xue
Guang Liu
LRM
74
1
0
10 Dec 2024
Previous
123456...141516
Next