Predictive Model Markup Language

The Predictive Model Markup Language is an XML-based predictive model interchange format conceived by Dr. Robert Lee Grossman, then the director of the National Center for Data Mining at the University of Illinois at Chicago. PMML provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and other feedforward neural networks. Version 0.9 was published in 1998. Subsequent versions have been developed by the Data Mining Group.
Since PMML is an XML-based standard, the specification comes in the form of an XML schema. PMML itself is a mature standard with over 30 organizations having announced products supporting PMML.

PMML Components

A PMML file can be described by the following components:

Header: contains general information about the PMML document, such as copyright information for the model, its description, and information about the application used to generate the model such as name and version. It also contains an attribute for a timestamp which can be used to specify the date of model creation.
Data Dictionary: contains definitions for all the possible fields used by the model. It is here that a field is defined as continuous, categorical, or ordinal. Depending on this definition, the appropriate value ranges are then defined as well as the data type.
Data Transformations: transformations allow for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations.
* Normalization: map values to numbers, the input can be continuous or discrete.
* Discretization: map continuous values to discrete values.
* Value mapping: map discrete values to discrete values.
* Functions : derive a value by applying a function to one or more parameters.
* Aggregation: used to summarize or collect groups of values.
Model: contains the definition of the data mining model. E.g., A multi-layered feedforward neural network is represented in PMML by a "NeuralNetwork" element which contains attributes such as:
* Model Name
* Function Name
* Algorithm Name
* Activation Function
* Number of Layers
Mining Schema: a list of all fields used in the model. This can be a subset of the fields as defined in the data dictionary. It contains specific information about each field, such as:
* Name : must refer to a field in the data dictionary
* Usage type : defines the way a field is to be used in the model. Typical values are: active, predicted, and supplementary. Predicted fields are those whose values are predicted by the model.
* Outlier Treatment : defines the outlier treatment to be use. In PMML, outliers can be treated as missing values, as extreme values, or as is.
* Missing Value Replacement Policy : if this attribute is specified then a missing value is automatically replaced by the given values.
* Missing Value Treatment : indicates how the missing value replacement was derived.
Targets: allows for post-processing of the predicted value in the format of scaling if the output of the model is continuous. Targets can also be used for classification tasks. In this case, the attribute priorProbability specifies a default probability for the corresponding target category. It is used if the prediction logic itself did not produce a result. This can happen, e.g., if an input value is missing and there is no other method for treating missing values.
Output: this element can be used to name all the desired output fields expected from the model. These are features of the predicted field and so are typically the predicted value itself, the probability, cluster affinity, standard error, etc. The latest release of PMML, PMML 4.1, extended Output to allow for generic post-processing of model outputs. In PMML 4.1, all the built-in and custom functions that were originally available only for pre-processing became available for post-processing too.
PMML 4.0, 4.1, 4.2 and 4.3

PMML 4.0 was released on June 16, 2009.
Examples of new features included:

Improved Pre-Processing Capabilities: Additions to built-in functions include a range of Boolean operations and an If-Then-Else function.
Time Series Models: New exponential Smoothing models; also place holders for ARIMA, Seasonal Trend Decomposition, and Spectral density estimation, which are to be supported in the near future.
Model Explanation: Saving of evaluation and model performance measures to the PMML file itself.
Multiple Models: Capabilities for model composition, ensembles, and segmentation.
Extensions of Existing Elements: Addition of multi-class classification for Support Vector Machines, improved representation for Association Rules, and the addition of Cox Regression Models.

PMML 4.1 was released on December 31, 2011.
New features included:

New model elements for representing Scorecards, k-Nearest Neighbors and Baseline Models.
Simplification of multiple models. In PMML 4.1, the same element is used to represent model segmentation, ensemble, and chaining.
Overall definition of field scope and field names.
A new attribute that identifies for each model element if the model is ready or not for production deployment.
Enhanced post-processing capabilities.

PMML 4.2 was released on February 28, 2014.
New features include:

Transformations: New elements for implementing text mining
New built-in functions for implementing regular expressions: matches, concat, and replace
Simplified outputs for post-processing
Enhancements to Scorecard and Naive Bayes model elements

PMML 4.3 was released on August 23, 2016.
New features include:

New Model Types:
* Gaussian Process
* Bayesian Network
New built-in functions
Usage clarifications
Documentation improvements
Release history

Data Mining Group

The is a consortium managed by the Center for Computational Science Research, Inc., a nonprofit founded in 2008. The Data Mining Group also developed a standard called Portable Format for Analytics, or PFA, which is complementary to PMML.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Predictive Model Markup Language

PMML Components

PMML 4.0, 4.1, 4.2 and 4.3

Release history

Data Mining Group