Risk score


Risk score is the name given to a general practice in applied statistics, bio-statistics, econometrics and other related disciplines, of creating an easily calculated number that reflects the level of risk in the presence of some risk factors.
Risk scores are designed to be:
A typical scoring method is composed of 3 components:
  1. A set of consistent rules that assign a numerical value to each risk factor that reflect our estimation of underlying risk.
  2. A formula that calculates the score.
  3. A set of thresholds that helps to translate the calculated score into a level of risk, or an equivalent formula or set of rules to translate the calculated score back into probabilities.
Items 1 & 2 can be achieved by using some form of regression, that will provide both the risk estimation and the formula to calculate the score. Item 3 requires setting an arbitrary set of thresholds and will usually involve expert opinion.

Estimating risk with GLM

Risk score are designed to represent an underlying probability of an adverse event denoted given a vector of explaining variables containing measurements of the relevant risk factors. In order to establish the connection between the risk factors and the probability we estimate a set of weights is estimated using a generalized linear model:
Where is a real-valued, monotonically increasing function that maps the values of the linear predictor to the interval. GLM methods typically uses the logit or probit as the link function.

Estimating risk with other methods

While it's possible to estimate using other statistical or machine learning methods, the requirements of simplicity and easy interpretation make most of these methods difficult to use for scoring in this context:
When using GLM, the set of estimated weights can be used to assign different values to different values of the risk factors in . The score can then be expressed as a weighted sum:
Let denote a set of "escalating" actions available for the decision maker. In order to define a decision rule, we want to define a map between different values of the score and the possible decisions in. Let be a partition of into consecutive, non-overlapping intervals, such that.
The map is defined as follows:

Biostatistics

The primary use of scores in the financial sector is for Credit scorecards, or credit scores: