Use of probabilistic phrases in a coordination game: human versus GPT-4

English speakers use probabilistic phrases such as likely to communicate information about the probability or likelihood of events. Communication is successful to the extent that the listener grasps what the speaker means to convey and, if communication is successful, two individuals can potentially coordinate their actions based on shared knowledge about uncertainty. We first assessed human ability to estimate the probability and the ambiguity (imprecision) of 23 probabilistic phrases in two different contexts, investment advice and medical advice. We then had GPT4 (OpenAI), a recent Large Language Model, complete the same tasks as the human participants. We found that the median human participant and GPT4 assigned probability estimates that were in good agreement (proportions of variance accounted were close to .90). GPT4's estimates of probability both in the investment and Medical contexts were as close or closer to that of the human participants as the human participants were to one another. Estimates of probability for both the human participants and GPT4 were little affected by context. In contrast, human and GPT4 estimates of ambiguity were not in as good agreement. We repeated some of the GPT4 estimates to assess their stability: does GPT4, if run twice, produce the same or similar estimates? There is some indication that it does not.
View on arXiv