Soft independent modelling by class analogy is a statistical method for supervised classification of data. The method requires a training data setconsisting of samples with a set of attributes and their class membership. The term soft refers to the fact the classifier can identify samples as belonging to multiple classes and not necessarily producing a classification of samples into non-overlapping classes.
Method
In order to build the classification models, the samples belonging to each class need to be analysed using principal component analysis ; only the significant components are retained. For a given class, the resulting model then describes either a line, plane or hyper-plane. For each modelled class, the mean orthogonal distance of training data samples from the line, plane, or hyper-plane is used to determine a critical distance for classification. This critical distance is based on the F-distribution and is usually calculated using 95% or 99% confidence intervals. New observations are projected into each PC model and the residual distances calculated. An observation is assigned to the model class when its residual distance from the model is below the statistical limit for the class. The observation may be found to belong to multiple classes and a measure of goodness of the model can be found from the number of cases where the observations are classified into multiple classes. The classification efficiency is usually indicated by Receiver operating characteristics. In the original SIMCA method, the ends of the hyper-plane of each class are closed off by setting statistical control limits along the retained principal components axes. More recent adaptations of the SIMCA method close off the hyper-plane by construction of ellipsoids. With such modified SIMCA methods, classification of an object requires both that its orthogonal distance from the model and its projection within the model are not significant.
Application
SIMCA as a method of classification has gained widespread use especially in applied statistical fields such as chemometrics and spectroscopic data analysis.