Brier score

The Brier score is a proper score function that measures the accuracy of probabilistic predictions. It is applicable to tasks in which predictions must assign probabilities to a set of mutually exclusive discrete outcomes. The set of possible outcomes can be either binary or categorical in nature, and the probabilities assigned to this set of outcomes must sum to one. It was proposed by Glenn W. Brier in 1950.
The Brier score can be thought of as a "cost function". More precisely, across all items in a set of N predictions, the Brier score measures the mean squared difference between:

The predicted probability assigned to the possible outcomes for item i
The actual outcome

Therefore, the lower the Brier score is for a set of predictions, the better the predictions are calibrated. Note that the Brier score, in its most common formulation, takes on a value between zero and one, since this is the square of the largest possible difference between a predicted probability and the actual outcome. In the original formulation of the Brier score, the range is double, from zero to two.
The Brier score is appropriate for binary and categorical outcomes that can be structured as true or false, but is inappropriate for ordinal variables which can take on three or more values.

Definition

The most common formulation of the Brier score is
in which is the probability that was forecast, the actual outcome of the event at instance and is the number of forecasting instances. In effect, it is the mean squared error of the forecast. This formulation is mostly used for binary events. The above equation is a proper scoring rule only for binary events; if a multi-category forecast is to be evaluated, then the original definition given by Brier below should be used.

Example

Suppose that one is forecasting the probability that it will rain on a given day. Then the Brier score is calculated as follows:

If the forecast is 100% and it rains, then the Brier Score is 0, the best score achievable.
If the forecast is 100% and it does not rain, then the Brier Score is 1, the worst score achievable.
If the forecast is 70% and it rains, then the Brier Score is ² = 0.09.
In contrast, if the forecast is 70% and it does not rain, then the Brier Score is ² = 0.49.
Similarly, if the forecast is 30% and it rains, then the Brier Score is ² = 0.49.
If the forecast is 50%, then the Brier score is ² = ² = 0.25, regardless of whether it rains.
Original definition by Brier

Although the above formulation is the most widely used, the original definition by Brier is applicable to multi-category forecasts as well as it remains a proper scoring rule, while the binary form is only proper for binary events. For binary forecasts, the original formulation of Brier's "probability score" has twice the value of the score currently known as the Brier score.
In which is the number of possible classes in which the event can fall, and the overall number of instances of all classes. For the case Rain / No rain,, while for the forecast Cold / Normal / Warm,.

Decompositions

There are several decompositions of the Brier score which provide a deeper insight on the behavior of a binary classifier.

3-component decomposition

The Brier score can be decomposed into 3 additive components: Uncertainty, Reliability, and Resolution.
Each of these components can be decomposed further according to the number of possible classes in which the event can fall. Abusing the equality sign:
With being the total number of forecasts issued, the number of unique forecasts issued, the observed climatological base rate for the event to occur, the number of forecasts with the same probability category and the observed frequency, given forecasts of probability. The bold notation is in the above formula indicates vectors, which is another way of denoting the original definition of the score and decomposing it according to the number of possible classes in which the event can fall. For example, a 70% chance of rain and an occurrence of no rain are denoted as and respectively. Operations like the square and multiplication on these vectors are understood to be component wise. The Brier Score is then the sum of the resulting vector on the right hand side.

Uncertainty

The uncertainty term measures the inherent uncertainty in the outcomes of the event. For binary events, it is at a maximum when each outcome occurs 50% of the time, and is minimal if an outcome always occurs or never occurs.

Reliability

The reliability term measures how close the forecast probabilities are to the true probabilities, given that forecast. Reliability is defined in the contrary direction compared to English language. If the reliability is 0, the forecast is perfectly reliable. For example, if we group all forecast instances where 80% chance of rain was forecast, we get a perfect reliability only if it rained 4 out of 5 times after such a forecast was issued.

Resolution

The resolution term measures how much the conditional probabilities given the different forecasts differ from the climatic average. The higher this term is the better. In the worst case, when the climatic probability is always forecast, the resolution is zero. In the best case, when the conditional probabilities are zero and one, the resolution is equal to the uncertainty.

Two-component decomposition

An alternative decomposition generates two terms instead of three.
The first term is known as calibration, and is equal to reliability. The second term is known as refinement, and it is an aggregation of resolution and uncertainty, and is related to the area under the ROC Curve.
The Brier Score, and the CAL + REF decomposition, can be represented graphically through the so-called Brier Curves, where the expected loss is shown for each operating condition. This makes the Brier Score a measure of aggregated performance under a uniform distribution of class asymmetries.

Shortcomings

The Brier score becomes inadequate for very rare events, because it does not sufficiently discriminate between small changes in forecast that are significant for rare events. Wilks has found that "uite large
sample sizes, i.e. n > 1000, are required for higher-skill forecasts of relatively rare events, whereas only quite modest sample sizes are needed for low-skill forecasts of common events."

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...