ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.06766
  4. Cited By
Mind Your Format: Towards Consistent Evaluation of In-Context Learning
  Improvements

Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

12 January 2024
Anton Voronov
Lena Wolf
Max Ryabinin
ArXivPDFHTML

Papers citing "Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements"

41 / 41 papers shown
Title
Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations
Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations
Moran Mizrahi
Chen Shani
Gabriel Stanovsky
Dan Jurafsky
Dafna Shahaf
29
0
0
29 Apr 2025
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark
Vladislav Mikhailov
Tita Ranveig Enstad
David Samuel
Hans Christian Farsethås
Andrey Kutuzov
Erik Velldal
Lilja Øvrelid
ELM
45
0
0
10 Apr 2025
Towards LLMs Robustness to Changes in Prompt Format Styles
Towards LLMs Robustness to Changes in Prompt Format Styles
Lilian Ngweta
Kiran Kate
Jason Tsay
Yara Rizk
AAML
VLM
35
0
0
09 Apr 2025
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
Eliya Habba
Ofir Arviv
Itay Itzhak
Yotam Perlitz
Elron Bandel
Leshem Choshen
Michal Shmueli-Scheuer
Gabriel Stanovsky
77
2
0
03 Mar 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
Tingchen Fu
Fazl Barez
AAML
65
0
0
03 Mar 2025
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
Sarah Ball
Simeon Allmendinger
Frauke Kreuter
Niklas Kühl
57
0
0
22 Feb 2025
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Yuanye Liu
Jiahang Xu
Li Lyna Zhang
Qi Chen
Xuan Feng
Yang Chen
Zhongxin Guo
Yuqing Yang
Cheng Peng
84
2
0
06 Feb 2025
The Curious Case of Arbitrariness in Machine Learning
Prakhar Ganesh
Afaf Taik
G. Farnadi
59
2
0
28 Jan 2025
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles
Samia Touileb
Vladislav Mikhailov
Marie Kroka
Lilja Øvrelid
Erik Velldal
44
3
0
13 Jan 2025
What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning
What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning
Jelena Bratulić
Sudhanshu Mittal
Christian Rupprecht
Thomas Brox
41
1
0
09 Jan 2025
SelfPrompt: Autonomously Evaluating LLM Robustness via
  Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
AAML
80
2
0
01 Dec 2024
Does Prompt Formatting Have Any Impact on LLM Performance?
Does Prompt Formatting Have Any Impact on LLM Performance?
Jia He
Mukund Rungta
David Koleczek
Arshdeep Sekhon
Franklin X Wang
Sadid Hasan
LLMAG
LRM
27
36
0
15 Nov 2024
Beemo: Benchmark of Expert-edited Machine-generated Outputs
Beemo: Benchmark of Expert-edited Machine-generated Outputs
Ekaterina Artemova
Jason Samuel Lucas
Saranya Venkatraman
Jooyoung Lee
Sergei Tilga
Adaku Uchendu
Vladislav Mikhailov
DeLMO
MoE
68
4
0
06 Nov 2024
Mixtures of In-Context Learners
Mixtures of In-Context Learners
Giwon Hong
Emile van Krieken
E. Ponti
Nikolay Malkin
Pasquale Minervini
38
0
0
05 Nov 2024
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
Chen Yang
Chenyang Zhao
Q. Gu
Dongruo Zhou
LRM
38
0
0
22 Oct 2024
POSIX: A Prompt Sensitivity Index For Large Language Models
POSIX: A Prompt Sensitivity Index For Large Language Models
Anwoy Chatterjee
H. S. V. N. S. K. Renduchintala
S. Bhatia
Tanmoy Chakraborty
AAML
39
6
0
03 Oct 2024
SSE: Multimodal Semantic Data Selection and Enrichment for
  Industrial-scale Data Assimilation
SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation
Maying Shen
Nadine Chang
Sifei Liu
Jose M. Alvarez
36
0
0
20 Sep 2024
A Novel Metric for Measuring the Robustness of Large Language Models in
  Non-adversarial Scenarios
A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios
Samuel Ackerman
Ella Rabinovich
E. Farchi
Ateret Anaby-Tavor
28
1
0
04 Aug 2024
SSP: Self-Supervised Prompting for Cross-Lingual Transfer to
  Low-Resource Languages using Large Language Models
SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models
Vipul Rathore
Aniruddha Deb
Ankish Chandresh
Parag Singla
Mausam
LRM
52
0
0
27 Jun 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine
  Translation and Summarization Evaluation
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Christoph Leiter
Steffen Eger
34
8
0
26 Jun 2024
Token-based Decision Criteria Are Suboptimal in In-context Learning
Token-based Decision Criteria Are Suboptimal in In-context Learning
Hakaze Cho
Yoshihiro Sakai
Mariko Kato
Kenshiro Tanaka
Akira Ishii
Naoya Inoue
46
2
0
24 Jun 2024
SEAM: A Stochastic Benchmark for Multi-Document Tasks
SEAM: A Stochastic Benchmark for Multi-Document Tasks
Gili Lior
Avi Caciularu
Arie Cattan
Shahar Levy
Ori Shapira
Gabriel Stanovsky
RALM
40
4
0
23 Jun 2024
When Parts are Greater Than Sums: Individual LLM Components Can
  Outperform Full Models
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Ting-Yun Chang
Jesse Thomason
Robin Jia
45
4
0
19 Jun 2024
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain
  Knowledge Graphs
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
Lina Wang
45
1
0
16 Jun 2024
Unraveling the Mechanics of Learning-Based Demonstration Selection for
  In-Context Learning
Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning
Hui Liu
Wenya Wang
Hao Sun
Chris Xing Tian
Chenqi Kong
Xin Dong
Haoliang Li
54
5
0
14 Jun 2024
FLEUR: An Explainable Reference-Free Evaluation Metric for Image
  Captioning Using a Large Multimodal Model
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
Yebin Lee
Imseong Park
Myungjoo Kang
27
11
0
10 Jun 2024
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language
  Models with Hints
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints
Aryo Pradipta Gema
Chaeeun Lee
Pasquale Minervini
Luke Daines
T. I. Simpson
Beatrice Alex
34
1
0
28 May 2024
Efficient multi-prompt evaluation of LLMs
Efficient multi-prompt evaluation of LLMs
Felipe Maia Polo
Ronald Xu
Lucas Weber
Mírian Silva
Onkar Bhardwaj
Leshem Choshen
Allysson Flavio Melo de Oliveira
Yuekai Sun
Mikhail Yurochkin
45
19
0
27 May 2024
Examining the robustness of LLM evaluation to the distributional
  assumptions of benchmarks
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
59
14
0
25 Apr 2024
Stronger Random Baselines for In-Context Learning
Stronger Random Baselines for In-Context Learning
Gregory Yauney
David M. Mimno
45
2
0
19 Apr 2024
The Hallucinations Leaderboard -- An Open Effort to Measure
  Hallucinations in Large Language Models
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
Giwon Hong
Aryo Pradipta Gema
Rohit Saxena
Xiaotang Du
Ping Nie
...
Laura Perez-Beltrachini
Max Ryabinin
Xuanli He
Clémentine Fourrier
Pasquale Minervini
LRM
HILM
38
11
0
08 Apr 2024
tinyBenchmarks: evaluating LLMs with fewer examples
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo
Lucas Weber
Leshem Choshen
Yuekai Sun
Gongjun Xu
Mikhail Yurochkin
ELM
26
77
0
22 Feb 2024
On Sensitivity of Learning with Limited Labelled Data to the Effects of
  Randomness: Impact of Interactions and Systematic Choices
On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices
Branislav Pecher
Ivan Srba
M. Bieliková
69
3
0
20 Feb 2024
Large Language Models Can Better Understand Knowledge Graphs Than We Thought
Large Language Models Can Better Understand Knowledge Graphs Than We Thought
Xinbang Dai
Yuncheng Hua
Tongtong Wu
Yang Sheng
Qiu Ji
Guilin Qi
82
0
0
18 Feb 2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi
Guy Kaplan
Daniel Malkin
Rotem Dror
Dafna Shahaf
Gabriel Stanovsky
ELM
29
127
0
31 Dec 2023
The crime of being poor
The crime of being poor
Georgina Curto
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
18
2
0
24 Mar 2023
Prototypical Calibration for Few-shot Learning of Language Models
Prototypical Calibration for Few-shot Learning of Language Models
Zhixiong Han
Y. Hao
Li Dong
Yutao Sun
Furu Wei
178
52
0
20 May 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming
  Few-Shot Prompt Order Sensitivity
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
279
1,124
0
18 Apr 2021
What Makes Good In-Context Examples for GPT-$3$?
What Makes Good In-Context Examples for GPT-333?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
275
1,312
0
17 Jan 2021
Making Pre-trained Language Models Better Few-shot Learners
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
243
1,919
0
31 Dec 2020
Simple and Scalable Predictive Uncertainty Estimation using Deep
  Ensembles
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Balaji Lakshminarayanan
Alexander Pritzel
Charles Blundell
UQCV
BDL
276
5,661
0
05 Dec 2016
1