Robinson–Foulds metric

The Robinson–Foulds or symmetric difference metric is a crude and biased measure of the distance between unrooted phylogenetic trees. It is defined as where A is the number of partitions of data implied by the first tree but not the second tree and B is the number of partitions of data implied by the second tree but not the first tree. The partitions are calculated for each tree by removing each branch. Thus, the number of eligible partitions for each tree is equal to the number of branches in that tree. Generalzied Robinson-Foulds metrics have superseded the original metric: these demonstrate better theoretical and practical performance, and avoid the biases and misleading attributes of the original metric.

Explanation

Given two unrooted trees of nodes and a set of labels for each node the Robinson–Foulds metric finds the number of and operations to convert one into the other. The number of operations defines their distance. Rooted trees can be examined by assigning a label to the leaf node.
The authors define two trees to be the same if they are isomorphic and the isomorphism preserves the labeling. The construction of the proof is based on a function called, which contracts an edge. Conversely, expands an edge, where the set can be split in any fashion.
The function removes all edges from that are not in, creating, and then is used to add the edges only discovered in to the tree to build . The number of operations in each of these procedures is equivalent to the number of edges in that are not in plus the number of edges in that are not in. The sum of the operations is equivalent to a transformation from to, or vice versa.

Properties

The RF distance corresponds to an equivalent similarity metric that reflects the resolution of the strict consensus of two trees, first used to compare trees in 1980.
In their 1981 paper Robinson and Foulds proved that the distance is in fact a metric.

Algorithms for computing the metric

In 1985 Day gave an algorithm based on perfect hashing that computes this distance that has only a linear complexity in the number of nodes in the trees. A randomized algorithm that uses hash tables that are not necessarily perfect has been shown to approximate the Robinson-Foulds distance with a bounded error in sublinear time.

Specific applications

In phylogenetics, the metric is often used to compute a distance between two trees. The treedist program in the PHYLIP suite offers this function, as does the package, the Python library, and R packages and . For comparing groups of trees, the fastest implementations include HashRF and MrsRF.
The Robinson–Foulds metric has also been used in quantitative comparative linguistics to compute distances between trees that represent how languages are related to each other.

Shortcomings

The RF metric suffers a number of theoretical and practical shortcomings:

Relative to other metrics, lacks sensitivity, and is thus imprecise; it can take two fewer distinct values than there are taxa in a tree.
It is rapidly saturated; very similar trees can be allocated the maximum distance value.
Its value can be counterintuitive. One example is that moving a tip and its neighbour to a particular point on a tree generates a _lower_ difference value than if just one of the two tips were moved to the same place.
Its range of values can depend on tree shape: trees that contain many uneven partitions will command relatively lower distances, on average, than trees with many even partitions.
It lacks a meaningful unit: a difference of one clade may be trivial, or may be fundamental.
It performs more poorly than many alternative measures in practical settings, based on simulated trees.

These issues can be addressed by using less conservative metrics. "Generalized RF distances" recognize similarity between similar, but non-identical, splits; the original Robinson Foulds distance doesn't care how similar two groupings are, if they aren't identical, they are thrown out with the bathwater.
The best-performing generalized Robinson-Foulds distances have a basis in information theory, and measure the distance between trees in terms of the quantity of information that the trees' splits hold in common. The Clustering Information Distance is recommended as the most suitable alternative to the Robinson-Foulds distance.
An alternative approach to tree distance calculation is to use quartets, rather than splits, as the basis for tree comparison.

Software implementations

Language/Program	Function	Notes
R	`dist.dendlist` from dendextend	See
R	`RobinsonFoulds` from TreeDist	Faster than phangorn implementation; see
Python	`tree_1.robinson_foulds` from ete3	See

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...