Statistical conclusion validity


Statistical conclusion validity is the degree to which conclusions about the relationship among variables based on the data are correct or "reasonable". This began as being solely about whether the statistical conclusion about the relationship of the variables was correct, but now there is a movement towards moving to "reasonable" conclusions that use: quantitative, statistical, and qualitative data.
Fundamentally, two types of errors can occur: type I and type II . Statistical conclusion validity concerns the qualities of the study that make these types of errors more likely.
Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures.

Common threats

The most common threats to statistical conclusion validity are:

Low statistical power

is the probability of correctly rejecting the null hypothesis when it is false. Experiments with low power have a higher probability of incorrectly accepting the null hypothesis—that is, committing a type II error and concluding that there is no effect when there actually is. Low power occurs when the sample size of the study is too small given other factors.

Violated assumptions of the test statistics

Most statistical tests involve assumptions about the data that make the analysis suitable for testing a hypothesis. Violating the assumptions of statistical tests can lead to incorrect inferences about the cause-effect relationship. The robustness of a test indicates how sensitive it is to violations. Violations of assumptions may make tests more or less likely to make type I or II errors.

Dredging and the error rate problem

Each hypothesis test involves a set risk of a type I error. If a researcher searches or "dredges" through their data, testing many different hypotheses to find a significant effect, they are inflating their type I error rate. The more the researcher repeatedly tests the data, the higher the chance of observing a type I error and making an incorrect inference about the existence of a relationship.

Unreliability of measures

If the dependent and/or independent variable are not measured reliably, incorrect conclusions can be drawn.

Restriction of range

Restriction of range, such as floor and ceiling effects or selection effects, reduce the power of the experiment, and increase the chance of a type II error. This is because correlations are attenuated by reduced variability.

Heterogeneity of the units under study

Greater heterogeneity of individuals participating in the study can also impact interpretations of results by increasing the variance of results or obscuring true relationships (see also sampling error., the higher the standard deviation will be. This obscures possible interactions between the characteristics of the units and the cause-effect relationship.

Threats to internal validity

Any effect that can impact the internal validity of a research study may bias the results and impact the validity of statistical conclusions reached. These threats to internal validity include unreliability of treatment implementation or failing to control for extraneous variables.