Ranking SVM

In machine learning, a Ranking SVM is a variant of the support vector machine algorithm, which is used to solve certain ranking problems. The ranking SVM algorithm was published by Thorsten Joachims in 2002.
The original purpose of the algorithm was to improve the performance of an internet search engine. However, it was found that Ranking SVM also can be used to solve other problems such as Rank SIFT.

Description

The Ranking SVM algorithm is a learning retrieval function that employs pair-wise ranking methods to adaptively sort results based on how 'relevant' they are for a specific query. The Ranking SVM function uses a mapping function to describe the match between a search query and the features of each of the possible results. This mapping function projects each data pair onto a feature space. These features are combined with the corresponding click-through data and can then be used as the training data for the Ranking SVM algorithm.
Generally, Ranking SVM includes three steps in the training period:

It maps the similarities between queries and the clicked pages onto a certain feature space.
It calculates the distances between any two of the vectors obtained in step 1.
It forms an optimization problem which is similar to a standard SVM classification and solves this problem with the regular SVM solver.
Background

Ranking Method

Suppose is a data set containing elements. is a ranking method applied to. Then
the in can be represented as a by asymmetric binary matrix. If the rank of is higher than the rank of, i.e., the corresponding position of this matrix is set to value of "1". Otherwise the element in that position will be set as the value "0".

Kendall’s Tau M.Kemeny . Rank Correlation Methods, Hafner, 1955A.Mood, F. Graybill, and D. Boes. Introduction to the Theory of Statistics. McGraw-Hill, 3rd edition, 1974

Kendall's Tau also refers to Kendall tau rank correlation coefficient, which is commonly used to compare two ranking methods for the same data set.
Suppose and are two ranking method applied to data set, the Kendall's Tau between and can be represented as follows:
where is the number of concordant pairs and is the number of discordant pairs. A pair and is concordant if both and agree in how they order and . It is discordant if they disagree.

Information Retrieval Quality J. Kemeny and L. Snell. Mathematical Models in THE Social Sciences. Ginn & Co. 1962Y. Yao. Measuring retrieval effectiveness based on user preference of documents. Journal of the American Society for Information Science, 46(2): 133-145, 1995.R.Baeza- Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison- Wesley-Longman, Harlow, UK, May 1999

quality is usually evaluated by the following three measurements:

Precision
Recall
Average Precision

For a specific query to a database, let be the set of relevant information elements in the database and be the set of the retrieved information elements. Then the above three measurements can be represented as follows:
where is the of.
Let and be the expected and proposed ranking methods of a database respectively, the lower bound of Average Precision of method can be represented as follows:
where is the number of different elements in the upper triangular parts of matrices of and and is the number of relevant elements in the data set.

SVM Classifier C. Cortes and V.N Vapnik. Support-vector networks. Machine Learning Journal, 20: 273-297,1995

Suppose is the element of a training data set, where is the feature vector and is the label. A typical SVM classifier for such data set can be defined as the solution of the following optimization problem.
The solution of the above optimization problem can be represented as a linear combination of the feature vectors s.
where is the coefficients to be determined.

Ranking SVM algorithm

Loss Function

Let be the Kendall's tau between expected ranking method and proposed method, it can be proved that maximizing helps to minimize the lower bound of the Average Precision of.

Expected Loss Function

The negative can be selected as the loss function to minimize the lower bound of Average Precision of
where is the statistical distribution of to certain query.

Empirical Loss Function

Since the expected loss function is not applicable, the following empirical loss function is selected for the training data in practice.

Collecting training data

queries are applied to a database and each query corresponds to a ranking method. The training data set has elements. Each element contains a query and the corresponding ranking method.

Feature Space

A mapping function is required to map each query and the element of database to a feature space. Then each point in the feature space is labelled with certain rank by ranking method.

Optimization problem

The points generated by the training data are in the feature space, which also carry the rank information. These labeled points can be used to find the boundary that specifies the order of them. In the linear case, such boundary is a vector.
Suppose and are two elements in the database and denote if the rank of is higher than in certain ranking method. Let vector be the linear classifier candidate in the feature space. Then the ranking problem can be translated to the following SVM classification problem. Note that one ranking method corresponds to one query.
The above optimization problem is identical to the classical SVM classification problem, which is the reason why this algorithm is called Ranking-SVM.

Retrieval Function

The optimal vector obtained by the training sample is
So the retrieval function could be formed based on such optimal classifier.

For new query, the retrieval function first projects all elements of the database to the feature space. Then it orders these feature points by the values of their inner products with the optimal vector. And the rank of each feature point is the rank of the corresponding element of database for the query.

Application of Ranking SVM

Ranking SVM can be applied to rank the pages according to the query. The algorithm can be trained using click-through data, where consists of the following three parts:

Query.
Present ranking of search results
Search results clicked on by user

The combination of 2 and 3 cannot provide full training data order which is needed to apply the full SVM algorithm. Instead, it provides a part of the ranking information of the training data. So the algorithm can be slightly revised as follows.
The method does not provide ranking information of the whole dataset, it's a subset of the full ranking method. So the condition of optimization problem becomes more relax compared with the original Ranking-SVM.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...