Take-the-best heuristic
The take-the-best heuristic estimates which of two alternatives has a higher value on a criterion by choosing the alternative based on the first cue that discriminates between the alternatives, where cues are ordered by cue validity. In the original formulation, the cues were assumed to have binary values or have an unknown value. The logic of the heuristic is that it bases its choice on the best cue only and ignores the rest.
Gerd Gigerenzer and Daniel Goldstein discovered that the heuristic did surprisingly well at making accurate inferences in real-world environments, such as inferring which of two cities is larger. The heuristic has since been modified and applied to domains from medicine, artificial intelligence, and political forecasting. It has also been shown that the heuristic can accurately model how experts such as airport customs officers and professional burglars make decisions. The heuristic can also predict details of the cognitive process, such as number of cues used and response times, and often better than complex models that integrate all available cues.
One-reason decision-making
Theories of decision making typically assume that all relevant reasons are searched and integrated into a final decision. Yet under uncertainty, the relevant cues are typically not all known, nor are their precise weights and the correlations between cues. In these situations, relying only on the best cue available may be a reasonable alternative that allows for fast, frugal, and accurate decisions. This is the logic of a class of heuristics known as “one-reason decision making,” which includes take-the-best. Consider cues with binary values, where 1 indicates the cue value that is associated with a higher criterion value. The task is to infer which of two alternatives has the higher criterion value. An example is which of two NBA teams will win the game, based on cues such as home match and who won the last match. The take-the-best heuristic entails three steps to make such an inference:Search rule: Look through cues in the order of their validity.
Stopping rule: Stop search when the first cue is found where the values of the two alternatives differ.
Decision rule: Predict that the alternative with the higher cue value has the higher value on the outcome variable.
The validity v of a cue is given by v = C/, where C is the number of correct inferences when a cue discriminates, and W is the number of wrong inferences, all estimated from samples.
Take-the-best for the comparison task
Consider the task to infer which object, A or B, has a higher value on a numerical criterion. As an example imagine someone having to judge whether the German city of Cologne has a larger population than the other German city of Stuttgart. This judgment or inference has to be based on information provided by binary cues, like „Is the city a state capital? From a formal point of view, the task is a categorization: A pair is to be categorized as XA > XB or XB > XA, based on cue information.Cues are binary; this means they assume two values and can be modeled, for instance, as having the values 0 and 1. They are ranked according to their cue validity, defined as the proportion of correct comparisons among the pairs A and B, for which it has different values, i.e., for which it discriminates between A and B. Take-the-best analyses each cue, one after the other, according to the ranking by validity and stopping the first time a cue discriminates between the items and concluding that the item with the larger value has also a larger value on the criterion.
The matrix of all objects of the reference class, from which A and B have been taken, and of the cue values which describe these objects constitutes a so-called environment. Gigerenzer and Goldstein, who introduced Take-The-Best considered, as a walk-through example, precisely pairs of German cities. yet only those with more than 100.000 inhabitants. The comparison task for a given pair of German cities in the reference class, consisted in establishing which one has a larger population, based on nine cues. Cues were binary-valued, such as whether the city is a state capital or whether it has a soccer team in the national league.
The cue values could modeled by 1's and 0's so that each city could be identified with its "cue profile", i.e., e vector of 1' and 0's, ordered according to the ranking of cues.
The question was: How can one infer which of two objects, for example,
city A with cue profile and
city B with cue profile ,
scores higher on the established criterion, i.e., population size?
The take-the-best heuristic simply compares the profiles lexicographically, just as numbers written in base two are compared: the first cue value is 1 for both, which means that the first cue does not discriminate between A and B. The second cue value is 0 for both, again with no discrimination. The same happens for the third cue value, while the fourth cue value is 1 for A and 0 for B, implying that A is judged as having a higher value on the criterion.
In other words, XA > XB if and only if > .
Mathematically this means that the cues found for the comparison allow a quasi-order isomorphism between the objects compared on the criterion, in this case cities with their populations, and their corresponding binary vectors. Here "quasi" means that the isomorphism is, in general, not perfect, because the set of cues is not perfect.
What is surprising is that this simple heuristic has a great performance compared with other strategies. One obvious measure for establishing the performance of an inference mechanism is determined by the percentage of correct judgements. Furthermore, what matters most is not just the performance of the heuristic when fitting known data, but when generalizing from a known training set to new items.
Czerlinski, Goldstein and Gigerenzer compared several strategies with Take-the-best: a simple Tallying, or unit weight model, a weighted linear model on the cues weighted by their validties, Linear Regression, and Minimalist. Their results show the robustness of Take-the-best in generalization.
File:Models fit predict cities.png|alt=Heuristic performance on the German city data set|thumb|Heuristic performance on the German city data set, generated with ggplot2 based on data in. See the steps to reproduce on .
For example, consider the task of selecting the bigger city of two cities when
- Models are fit to a data set of 83 German cities
- Models select the bigger of a pair of cities for all 83*82/2 pairs of cities.
However, the paper also considered generalization.
- Models are fit to a data set of a randomly-selected half of 83 German cities
- Models select the bigger of a pair of cities drawn from the *other* half of cities.