Discriminative model

Discriminative models, also referred to as conditional models or backward models, are a class of supervised machine learning used for classification or regression. These distinguish decision boundaries by inferring knowledge from observed data. This is different to the idea of generative or forward models, and discriminative models make fewer assumptions about the underlying data distribution and rely more on data quality.
For example. If a classification task is to separate pictures of cats and dogs then a model of this kind will only be able to decide whether a picture is of a cat or a dog. This is decided according to the most similar example from the training data. A generative model on the other hand will be able to produce a new picture of a either class.
Typical discriminative models include logistic regression, support vector machines, conditional random fields , decision trees, neural networks, and many others. Typical generative model approaches include naive Bayes classifiers, Gaussian mixture models, variational autoencoders and others.

Definition

Unlike generative modelling, which studies from the joint probability, discriminative modeling studies the or the direct maps the given unobserved variable a class label depended on the observed variables. For example, in object recognition, is likely to be a vector of raw pixels. Within a probabilistic framework, this is done by modeling the conditional probability distribution, which can be used for predicting from. Note that there is still distinction between the conditional model and the discriminative model, though more often they are simply categorised as discriminative model.

Pure discriminative model vs. conditional model

A conditional model models the conditional probability distribution, while the traditional discriminative model aims to optimize on mapping the input around the most similar trained samples.

Typical discriminative modelling approaches

The following approach is based on the assumption that it is given the training data-set, where is the corresponding output for the input.

Linear classifier

We intend to use the function to simulate the behavior of what we observed from the training data-set by the linear classifier method. Using the joint feature vector, the decision function is defined as:
According to Memisevic's interpretation,, which is also, computes a score which measures the computability of the input with the potential output. Then the determines the class with the highest score.

Logistic regression (LR)

Since the 0-1 loss function is a commonly used one in the decision theory, the conditional probability distribution, where is a parameter vector for optimizing the training data, could be reconsidered as following for the logistics regression model:
The equation above represents logistic regression. Notice that a major distinction between models is their way of introducing posterior probability. Posterior probability is inferred from the parametric model. We then can maximize the parameter by following equation:
It could also be replaced by the log-loss equation below:
Since the log-loss is differentiable, a gradient-based method can be used to optimize the model. A global optimum is guaranteed because the objective function is convex. The gradient of log likelihood is represented by:
where is the expectation of.
The above method will provide efficient computation for the relative small number of classification.

Support vector machine

Another continuous alternative to the 0/1-loss is the ’hinge-loss’, which can be defined as the following equation
The hinge loss measures the difference between the maximal confidence that the classifier has over all classes and the confidence it has in the correct class. In computing this maximum, all wrong classes get a ’head start’ by adding 1 to the confidence. As a result, the hinge loss is 0 if the confidence in the correct class is by at least 1 greater than the confidence in the closest follow-up. Even though the hinge-loss is not differentiable, it can also give rise to a tractable variant of the 0/1- loss based learning problem, since the hinge-loss allows it to recast to the equivalent constrained optimization problem.

Contrast with generative model

Contrast in approaches

Let's say we are given the class labels and feature variables,, as the training samples.
A generative model takes the joint probability, where is the input and is the label, and predicts the most possible known label for the unknown variable using Bayes' theorem.
Discriminative models, as opposed to generative models, do not allow one to generate samples from the joint distribution of observed and target variables. However, for tasks such as classification and regression that do not require the joint distribution, discriminative models can yield superior performance. On the other hand, generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks. In addition, most discriminative models are inherently supervised and cannot easily support unsupervised learning. Application-specific details ultimately dictate the suitability of selecting a discriminative versus generative model.
Discriminative models and generative models also differ in introducing the posterior possibility. To maintain the least expected loss, the minimization of result's misclassification should be acquired. In the discriminative model, the posterior probabilities,, is inferred from a parametric model, where the parameters come from the training data. Points of estimation of the parameters are obtained from the maximization of likelihood or distribution computation over the parameters. On the other hand, considering that the generative models focus on the joint probability, the class posterior possibility is considered in Bayes' theorem, which is

Advantages and disadvantages in application

In the repeated experiments, logistic regression and naive Bayes are applied here for different models on binary classification task, discriminative learning results in lower asymptotic errors, while generative one results in higher asymptotic errors faster. However, in Ulusoy and Bishop's joint work, Comparison of Generative and Discriminative Techniques for Object Detection and Classification, they state that the above statement is true only when the model is the appropriate one for data.

Advantages

Significant advantages of using discriminative modeling are:

Higher accuracy, which mostly leads to better learning result.
Allows simplification of the input and provides a direct approach to
Saves calculation resource
Generates lower asymptotic errors

Compared with the advantages of using generative modeling:

Takes all data into consideration, which could result in slower processing as a disadvantage
Requires fewer training samples
A flexible framework that could easily cooperate with other needs of the application
Disadvantages
Training method usually requires multiple numerical optimization techniques
Similarly by the definition, the discriminative model will need the combination of multiple subtasks for a solving complex real-world problem
Optimizations in applications

Since both advantages and disadvantages present on the two way of modeling, combining both approaches will be a good modeling in practice. For example, in Marras' article A Joint Discriminative Generative Model for Deformable Model Construction and Classification, he and his coauthors apply the combination of two modelings on face classification of the models, and receive a higher accuracy than the traditional approach.
Similarly, Kelm also proposed the combination of two modelings for pixel classification in his article Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning.
During the process of extracting the discriminative features prior to the clustering, Principal component analysis, though commonly used, is not a necessarily discriminative approach. In contrast, LDA is a discriminative one. Linear discriminant analysis, provides an efficient way of eliminating the disadvantage we list above. As we know, the discriminative model needs a combination of multiple subtasks before classification, and LDA provides appropriate solution towards this problem by reducing dimension.
In Beyerlein's paper, DISCRIMINATIVE MODEL COMBINATION, the discriminative model combination provides a new approach in auto speech recognition. It not only helps to optimize the integration of various kinds of models into one log-linear posterior probability distribution. The combination also aims at minimizing the empirical word error rate of training samples.
In the article, A Unified and Discriminative Model for Query Refinement, Guo and his partners use a unified discriminative model in query refinement using linear classifier, and successfully obtain a much higher accuracy rate. The experiment they design also consider generative model as a comparison with the unified model. Just as expected in the real-world application, the generative model perform the poorest comparing to the other models, including the models without their improvement.

Types

Examples of discriminative models used in machine learning include:

Logistic regression, a type of generalized linear regression used for predicting binary or categorical outputs
Support vector machines
Boosting
Conditional random fields
Linear regression
Neural networks
Random forests
Perceptrons

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...