Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. The application of multivariate statistics is multivariate analysis. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied. In addition, multivariate statistics is concerned with multivariate probability distributions, in terms of both Certain types of problems involving multivariate data, for example simple linear regression and multiple regression, are not usually considered to be special cases of multivariate statistics because the analysis is dealt with by considering the conditional distribution of a single outcome variable given the other variables.
Types of analysis
There are many different models, each with its own type of analysis:
Multivariate regression attempts to determine a formula that can describe how elements in a vector of variables respond simultaneously to changes in others. For linear relations, regression analyses here are based on forms of the general linear model. Some suggest that multivariate regression is distinct from multivariable regression, however, that is debated and not consistently true across scientific fields.
Principal components analysis creates a new set of orthogonal variables that contain the same information as the original set. It rotates the axes of variation to give a new set of orthogonal axes, ordered so that they summarize decreasing proportions of the variation.
Factor analysis is similar to PCA but allows the user to extract a specified number of synthetic variables, fewer than the original set, leaving the remaining unexplained variation as error. The extracted variables are known as latent variables or factors; each one may be supposed to account for covariation in a group of observed variables.
Canonical correlation analysis finds linear relationships among two sets of variables; it is the generalised version of bivariate correlation.
Redundancy analysis is similar to canonical correlation analysis but allows the user to derive a specified number of synthetic variables from one set of variables that explain as much variance as possible in another set. It is a multivariate analogue of regression.
Correspondence analysis, or reciprocal averaging, finds a set of synthetic variables that summarise the original set. The underlying model assumes chi-squared dissimilarities among records.
Canonical correspondence analysis for summarising the joint variation in two sets of variables ; combination of correspondence analysis and multivariate regression analysis. The underlying model assumes chi-squared dissimilarities among records.
Discriminant analysis, or canonical variate analysis, attempts to establish whether a set of variables can be used to distinguish between two or more groups of cases.
Linear discriminant analysis computes a linear predictor from two sets of normally distributed data to allow for classification of new observations.
Clustering systems assign objects into groups so that objects from the same cluster are more similar to each other than objects from different clusters.
Recursive partitioning creates a decision tree that attempts to correctly classify members of the population based on a dichotomous dependent variable.