Univariate (statistics)


Univariate is a term commonly used in statistics to describe a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry. Like all the other data, univariate data can be visualized using graphs, images or other analysis tools after the data is measured, collected, reported, and analyzed.

Univariate data types

Some univariate data consists of numbers, while others are nonnumerical. Generally, the terms categorical univariate data and numerical univariate data are used to distinguish between these types.

Categorical univariate data

Categorical univariate data consist non-numerical observations that may be placed in categories. It includes labels or names used to identify an attribute of each element. Categorical univariate data usually use either nominal or ordinal scale of measurement.

Numerical univariate data

Numerical univariate data consist observations that are numbers. They are obtained using either interval or ratio scale of measurement. This type of univariate data can be classified even further into two subcategories: discrete and continuous. A numerical univariate data is discrete if the set of all possible values is finite or countably infinite. Discrete univariate data are usually associated with counting. A numerical univariate data is continuous if the set of all possible values is an interval of numbers. Continuous univariate data are usually associated with measuring.

Data analysis and applications

Univariate analysis is the simplest form of analyzing data. Uni means one, so in other words the data has only one variable. Univariate data requires to analyze each variable separately. Data is gathered for the purpose of answering a question, or more specifically, a research question. Univariate data does not answer research questions about relationships between variables, but rather it is used to describe one characteristic or attribute that varies from observation to observation. Usually there are two purposes that a researcher can look for. The first one is to answer a research question with descriptive study and the second one is to get knowledge about how attribute varies with individual effect of a variable in Regression analysis. There are some ways to describe patterns found in univariate data which include graphical methods, measures of central tendency and measures of variability.

Graphical methods

The most frequently used graphical illustrations for univariate data are:

Frequency distribution tables

Frequency is how many times a number occurs. The frequency of an observation in statistics tells us the number of times the observation occurs in the data. For example, in the following list of numbers, the frequency of the number 9 is 5.

Bar charts

Bar chart is a graph consisting of rectangular bars. There bars actually represents number or percentage of observations of existing categories in a variable. The length or height of bars gives a visual representation of the proportional differences among categories.

Histograms

are used to estimate distribution of the data, with the frequency of values assigned to a value range called a bin.

Pie charts

Pie chart is a circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories.

Measures of central tendency

Central tendency is one of the most common numerical descriptive measures. It's used to estimate the central location of the univariate data by the calculation of mean, median and mode. Each of these calculation has its own advantages and limitations. The mean has the advantage that its calculation includes each value of the data set, but it is particularly susceptible to the influence of outliers. The median is a better measure when the data set contains outliers. The mode is simple to locate. The important thing is that it's not restricted to using only one of these measure of central tendency. If the data being analyzed is categorical, then the only measure of central tendency that can be used is the mode. However, if the data is numerical in nature then the mode, median, or mean can all be used to describe the data. Using more than one of these measures provides a more accurate descriptive summary of central tendency for the univariate.

Measures of variability

A measure of variability or dispersion of a univariate data set can reveal the shape of a univariate data distribution more sufficiently. It will provide some information about the variation among data values. The measures of variability together with the measures of central tendency give a better picture of the data than the measures of central tendency alone. The three most frequently used measures of variability are range, variance and standard deviation. The appropriateness of each measure would depend on the type of data, the shape of the distribution of data and which measure of central tendency are being used. If the data is categorical, then there is no measure of variability to report. For data that is numerical, all three measures are possible. If the distribution of data is symmetrical, then the measures of variability are usually the variance and standard deviation. However, if the data are skewed, then the measure of variability that would be appropriate for that data set is the range.

Univariate distributions

is a dispersal type of a single random variable described either with a probability mass function for discrete probability distribution, or probability density function for continuous probability distribution. It is not to be confused with multivariate distribution.

Common discrete distributions

Bernoulli distribution
Binomial distribution
Geometric distribution
Negative binomial distribution
Poisson distribution
Hypergeometric distribution
Zeta distribution

Common continuous distributions

Normal distribution
Gamma distribution
Exponential distribution
Weibull distribution
Cauchy distribution
Beta distribution