Cronbach's alpha


Tau-equivalent reliability is a single-administration test score reliability coefficient, commonly referred to as Cronbach's alpha or coefficient alpha. is the most famous and commonly used among reliability coefficients, but recent studies recommend not using it unconditionally. Reliability coefficients based on structural equation modeling are often recommended as its alternative.

Formula and calculation

Systematic and conventional formula

Let denote the observed score of item and denote the sum of all items in a test consisting of items. Let denote the covariance between and, denote the variance of, and denote the variance of. consists of item variances and inter-item covariances. That is,. Let denote the average of the inter-item covariances. That is,.
's "systematic" formula is

.

The more frequently used but more difficult to understand version of the formula is

.

Calculation example

When applied to appropriate data

is applied to the following data that satisfies the condition of being tau-equivalent.

,,
,
and.

When applied to inappropriate data

is applied to the following data that does not satisfy the condition of being tau-equivalent.

,,
,
and.
Compare this value with the value of applying congeneric reliability to the same data.

Prerequisites for using tau-equivalent reliability

In order to use as a reliability coefficient, the data must satisfy the following conditions.
1) Unidimensionality
2) tau-equivalence
3) Independence between errors

The conditions of being parallel, tau-equivalent, and congeneric

Parallel condition

At the population level, parallel data have equal inter-item covariances and equal variances. For example, the following data satisfy the parallel condition. In parallel data, even if a correlation matrix is used instead of a covariance matrix, there is no loss of information. All parallel data are also tau-equivalent, but the reverse is not true. That is, among the three conditions, the parallel condition is most difficult to meet.

Tau-equivalent condition

At the population level, tau-equivalent data have equal covariances, but their variances may have different values. For example, the following data satisfies the condition of being tau-equivalent. All items in tau-equivalent data have equal discrimination or importance. All tau-equivalent data are also congeneric, but the reverse is not true.

Congeneric condition

At the population level, congeneric data need not have equal variances or covariances, provided they are unidimensional. For example, the following data meet the condition of being congeneric. All items in congeneric data can have different discrimination or importance.

Relationship with other reliability coefficients

Classification of single-administration reliability coefficients

Conventional names

There are numerous reliability coefficients. Among them, the conventional names of reliability coefficients that are related and frequently used are summarized as follows:
Split-halfUnidimensionalMultidimensional
ParallelSpearman-Brown formulaStandardized
Tau-equivalentFlanagan formula
Rulon formula
Flanagan-Rulon formula
Guttman's
Cronbach's
coefficient
Guttman's
KR-20
Hoyt reliability
Stratified
CongenericAngoff-Feldt coefficient
Raju coefficient
composite reliability
construct reliability
congeneric reliability
coefficient
unidimensional
Raju coefficient
coefficient
total
MdDonald's
multidimensional

Combining row and column names gives the prerequisites for the corresponding reliability coefficient. For example, Cronbach's and Guttman's are reliability coefficients derived under the condition of being unidimensional and tau-equivalent.

Systematic names

Conventional names are disordered and unsystematic. Conventional names give no information about the nature of each coefficient, or give misleading information. Conventional names are inconsistent. Some are formulas, and others are coefficients. Some are named after the original developer, some are named after someone who is not the original developer, and others do not include the name of any person. While one formula is referred to by multiple names, multiple formulas are referred to by one notation. The proposed systematic names and their notation for these reliability coefficients are as follows:
Split-halfUnidimensionalMultidimensional
Parallelsplit-half parallel reliabilityparallel reliabilitymultidimensional parallel reliability
Tau-equivalentsplit-half tau-equivalent reliabilitytau-equivalent reliabilitymultidimensional tau-equivalent reliability
Congenericsplit-half congeneric reliability congeneric reliability Bifactor model
Bifactor reliability
Second-order factor model
Second-order factor reliability
Correlated factor model
Correlated factor reliability

Relationship with parallel reliability

is often referred to as coefficient alpha and is often referred to as standardized alpha.
Because of the standardized modifier, is often mistaken for a more standard version than.
There is no historical basis to refer to as standardized alpha.
Cronbach did not refer to this coefficient as alpha, nor did it recommend using it. was rarely used before the 1970s. As SPSS began to provide under the name of standardized alpha, this coefficient began to be used occasionally Cho, E. and Chun, S., Fixing a broken clock: A historical review of the originators of reliability coefficients including Cronbach's alpha. Survey Research, 19, 23–54.. The use of is not recommended because the parallel condition is difficult to meet in real-world data.

Relationship with split-half tau-equivalent reliability

equals the average of the values obtained for all possible split-halves. This relationship, proved by Cronbach, is often used to explain the intuitive meaning of. However, this interpretation overlooks the fact that underestimates reliability when applied to data that are not tau-equivalent. At the population level, the maximum of all possible values is closer to reliability than the average of all possible values. This mathematical fact was already known even before the publication of Cronbach. A comparative study reports that the maximum of is the most accurate reliability coefficient.
Revelle refers to the minimum of all possible values as coefficient, and recommends that provides complementary information that does not.

Relationship with congeneric reliability

If the assumptions of unidimensionality and tau-equivalence are satisfied, equals.
If unidimensionality is satisfied but tau-equivalence is not satisfied, is smaller than.
is the most commonly used reliability coefficient after. Users tend to present both, rather than replacing with .
A study investigating studies that presented both coefficients reports that is.02 smaller than on average.

Relationship with multidimensional reliability coefficients and \omega_{T}

If is applied to multidimensional data, its value is smaller than multidimensional reliability coefficients and larger than.

Relationship with Intraclass correlation

is said to be equal to the stepped-up consistency version of the intraclass correlation coefficient, which is commonly used in observational studies. But this is only conditionally true. In terms of variance components, this condition is, for item sampling: if and only if the value of the item variance component equals zero. If this variance component is negative, will underestimate the stepped-up intra-class correlation coefficient; if this variance component is positive, will overestimate this stepped-up intra-class correlation coefficient.

History

Before 1937

was the only known reliability coefficient. The problem was that the reliability estimates depended on how the items were split in half. Criticism was raised against this unreliability, but for more than 20 years no fundamental solution was found.

Kuder and Richardson (1937)

Kuder and Richardson developed several reliability coefficients that could overcome the problem of. They did not give the reliability coefficients particular names. Equation 20 in their article is. This formula is often referred to as Kuder-Richardson Formula 20, or KR-20. They dealt with cases where the observed scores were dichotomous, so the expression of KR-20 is slightly different from the conventional formula of. A review of this paper reveals that they did not present a general formula because they did not need to, not because they were not able to. Let denote the correct answer ratio of item, and denote the incorrect answer ratio of item . The formula of KR-20 is as follows.


Since, KR-20 and have the same meaning.

Between 1937 and 1951

Several studies published the general formula of KR-20

Kuder and Richardson made unnecessary assumptions to derive. Several studies have derived in a different way from Kuder and Richardson.
Hoyt derived using ANOVA. Cyril Hoyt may be considered the first developer of the general formula of the KR-20, but he did not explicitly present the formula of.
The first expression of the modern formula of appears in Jackson and Ferguson used the same version.


Guttman derived six reliability formulas, each denoted by. Louis Guttman proved that all of these formulas were always less than or equal to reliability, and based on these characteristics, he referred to these formulas as 'lower bounds of reliability'. Guttman's is, and is. He proved that is always greater than or equal to . At that time, all calculations were done with paper and pencil, and since the formula of was simpler to calculate, he mentioned that was useful under certain conditions.


Gulliksen derived with fewer assumptions than previous studies. The assumption he used is essential tau-equivalence in modern terms.

Recognition of KR-20's original formula and general formula at the time

The two formulas were recognized to be exactly identical, and the expression of general formula of KR-20 was not used. Hoyt explained that his method "gives precisely the same result" as KR-20. Jackson and Ferguson stated that the two formulas are "identical". Guttman said is "algebraically identical" to KR-20. Gulliksen also admitted that the two formulas are “identical”.
Even studies critical of KR-20 did not point out that the original formula of KR-20 could only be applied to dichotomous data.

Criticism of underestimation of KR-20

Developers of this formula reported that consistently underestimates reliability. Hoyt argued that this characteristic alone made more recommendable than the traditional split-half technique, which was unknown whether to underestimate or overestimate reliability.
Cronbach was critical of the underestimation of. He was concerned that it was not known how much underestimated reliability. He criticized that the underestimation was likely to be excessively severe, such that could sometimes lead to negative values.
Because of these problems, he argued that could not be recommended as an alternative to the split-half technique.

Cronbach (1951)

As with previous studies, Cronbach invented another method to derive. His interpretation was more intuitively attractive than those of previous studies. That is, he proved that equals the average of values obtained for all possible split-halves. He criticized that the name KR-20 was weird and suggested a new name, coefficient alpha. His approach has been a huge success. However, he not only omitted some key facts, but also gave an incorrect explanation.
First, he positioned coefficient alpha as a general formula of KR-20, but omitted the explanation that existing studies had published the precisely identical formula. Those who read only Cronbach without background knowledge could misunderstand that he was the first to develop the general formula of KR-20.
Second, he did not explain under what condition equals reliability. Non-experts could misunderstand that was a general reliability coefficient that could be used for all data regardless of prerequisites.
Third, he did not explain why he changed his attitude toward. In particular, he did not provide a clear answer to the underestimation problem of, which he himself had criticized.
Fourth, he argued that a high value of indicated homogeneity of the data.

After 1951

Novick and Lewis proved the necessary and sufficient condition for to be equal to reliability, and named it the condition of being essentially tau-equivalent.
Cronbach mentioned that the reason Cronbach received a lot of citations was "mostly because put a brand name on a common-place coefficient". He explained that he had originally planned to name other types of reliability coefficients in consecutive Greek letter, but later changed his mind.
Cronbach and Schavelson encouraged readers to use generalizability theory rather than. He opposed the use of the name Cronbach's alpha. He explicitly denied the existence of existing studies that had published the general formula of KR-20 prior to Cronbach.

Common misconceptions about tau-equivalent reliability

The value of tau-equivalent reliability ranges between zero and one

By definition, reliability cannot be less than zero and cannot be greater than one. Many textbooks mistakenly equate with reliability and give an inaccurate explanation of its range. can be less than reliability when applied to data that are not tau-equivalent. Suppose that copied the value of as it is, and copied by multiplying the value of by -1. The covariance matrix between items is as follows,.

Negative can occur for reasons such as negative discrimination or mistakes in processing reversely scored items.
Unlike, SEM-based reliability coefficients are always greater than or equal to zero.
This anomaly was first pointed out by Cronbach to criticize, but Cronbach did not comment on this problem in his article, which discussed all conceivable issues related and he himself described as being "encyclopedic".

If there is no measurement error, the value of tau-equivalent reliability is one

This anomaly also originates from the fact that underestimates reliability. Suppose that copied the value of as it is, and copied by multiplying the value of by two. The covariance matrix between items is as follows,.

For the above data, both and have a value of one.
The above example is presented by Cho and Kim.

A high value of tau-equivalent reliability indicates homogeneity between the items

Many textbooks refer to as an indicator of homogeneity between items. This misconception stems from the inaccurate explanation of Cronbach that high values show homogeneity between the items. Homogeneity is a term that is rarely used in the modern literature, and related studies interpret the term as referring to unidimensionality. Several studies have provided proofs or counterexamples that high values do not indicate unidimensionality. See counterexamples below.

\rho_ =. 72 in the unidimensional data above.

\rho_=.72 in the multidimensional data above.

The above data have, but are multidimensional.

The above data have, but are unidimensional.
Unidimensionality is a prerequisite for. You should check unidimensionality before calculating, rather than calculating to check unidimensionality.

A high value of tau-equivalent reliability indicates internal consistency

The term internal consistency is commonly used in the reliability literature, but its meaning is not clearly defined. The term is sometimes used to refer to a certain kind of reliability, but it is unclear exactly which reliability coefficients are included here, in addition to. Cronbach used the term in several senses without an explicit definition. Cho and Kim showed that is not an indicator of any of these.

Removing items using "alpha if item deleted" always increases reliability

Removing an item using "alpha if item deleted" may result in 'alpha inflation,' where sample-level reliability is reported to be higher than population-level reliability. It may also reduces population-level reliability. The elimination of less-reliable items should be based not only on a statistical basis, but also on a theoretical and logical basis. It is also recommended that the whole sample be divided into two and cross-validated.

Ideal reliability level and how to increase reliability

Nunnally's recommendations for the level of reliability

The most frequently cited source of how much reliability coefficients should be is Nunnally's book. However, his recommendations are cited contrary to his intentions. What he meant was to apply different criteria depending on the purpose or stage of the study. However, regardless of the nature of the research, such as exploratory research, applied research, and scale development research, a criterion of.7 is universally used..7 is the criterion he recommended for the early stages of a study, which most studies published in the journal are not. Rather than.7, the criterion of.8 referred to applied research by Nunnally is more appropriate for most empirical studies.
1st edition2nd & 3rd edition
Early stage of research.5 or.6.7
Applied research.8.8
When making important decisions.95 .95

His recommendation level did not imply a cutoff point. If a criterion means a cutoff point, it is important whether or not it is met, but it is unimportant how much it is over or under. He did not mean that it should be strictly.8 when referring to the criteria of.8. If the reliability has a value near.8, it can be considered that his recommendation has been met.
His idea was that there is a cost to increasing reliability, so there is no need to try to obtain maximum reliability in every situation.

Cost to obtain a high level of reliability

Many textbooks explain that the higher the value of reliability, the better. The potential side effects of high reliability are rarely discussed. However, the principle of sacrificing something to get one also applies to reliability.

Trade-off between reliability and validity

Measurements with perfect reliability lack validity. For example, a person who take the test with the reliability of one will get a perfect score or a zero score, because the examinee who give the correct answer or incorrect answer on one item will give the correct answer or incorrect answer on all other items. The phenomenon in which validity is sacrificed to increase reliability is called attenuation paradox.
A high value of reliability can be in conflict with content validity. For high content validity, each item should be constructed to be able to comprehensively represent the content to be measured. However, a strategy of repeatedly measuring essentially the same question in different ways is often used only for the purpose of increasing reliability..

Trade-off between reliability and efficiency

When the other conditions are equal, reliability increases as the number of items increases. However, the increase in the number of items hinders the efficiency of measurements.

Methods to increase reliability

Despite the costs associated with increasing reliability discussed above, a high level of reliability may be required. The following methods can be considered to increase reliability.

Before data collection

Eliminate the ambiguity of the measurement item.
Do not measure what the respondents do not know.
Increase the number of items. However, care should be taken not to excessively inhibit the efficiency of the measurement.
Use a scale that is known to be highly reliable.
Conduct a pretest. Discover in advance the problem of reliability.
Exclude or modify items that are different in content or form from other items.

After data collection

Remove the problematic items using "alpha if item deleted". However, this deletion should be accompanied by a theoretical rationale.
Use a more accurate reliability coefficient than. For example, is.02 larger than on average.

Which reliability coefficient to use

Should we continue to use tau-equivalent reliability?

is used in an overwhelming proportion. A study estimates that approximately 97% of studies use as a reliability coefficient.
However, simulation studies comparing the accuracy of several reliability coefficients have led to the common result that is an inaccurate reliability coefficient.
Methodological studies are critical of the use of. Simplifying and classifying the conclusions of existing studies are as follows.
Conditional use: Use only when certain conditions are met.
Opposition to use: is inferior and should not be used.

Alternatives to tau-equivalent reliability

Existing studies are practically unanimous in that they oppose the widespread practice of using unconditionally for all data. However, different opinions are given on which reliability coefficient should be used instead of.
Different reliability coefficients ranked first in each simulation study comparing the accuracy of several reliability coefficients.
The majority opinion is to use SEM-based reliability coefficients as an alternative to.
However, there is no consensus on which of the several SEM-based reliability coefficients is the best to use.
Some people say as an alternative, but shows information that is completely different from reliability. is a type of coefficient comparable to Revelle's. They do not substitute, but complement reliability.
Among SEM-based reliability coefficients, multidimensional reliability coefficients are rarely used, and the most commonly used is .

Software for SEM-based reliability coefficients

General-purpose statistical software such as SPSS and SAS include a function to calculate. Users who don't know the formula of have no problem in obtaining the estimates with just a few mouse clicks.
SEM software such as AMOS, LISREL, and MPLUS does not have a function to calculate SEM-based reliability coefficients. Users need to calculate the result by inputting it to the formula. To avoid this inconvenience and possible error, even studies reporting the use of SEM rely on instead of SEM-based reliability coefficients. There are a few alternatives to automatically calculate SEM-based reliability coefficients.
1) R : The psych package calculates various reliability coefficients.
2) EQS : This SEM software has a function to calculate reliability coefficients.
3) RelCalc : Available with Microsoft Excel. \rho_C can be obtained without the need for SEM software. Various multidimensional SEM reliability coefficients and various types of can be calculated based on the results of SEM software.

Derivation of formula

Assumption 1. The observed score of an item consists of the true score of the item and the error of the item, which is independent of the true score.
Lemma.
Assumption 2. Errors are independent of each other.
Assumption 3. The true score of an item consists of the true score common to all items and the constant of the item.
Let denote the sum of the item true scores.
The variance of is called the true score variance.
Definition. Reliability is the ratio of true score variance to observed score variance.
The following relationship is established from the above assumptions.
Therefore, the covariance matrix between items is as follows.

You can see that equals the mean of the covariances between items. That is,
Let denote the reliability when satisfying the above assumptions. is: