Graphlets

Graphlets are small connected non-isomorphic induced subgraphs of a large network. Graphlets differ from network motifs, since they must be induced subgraphs, whereas motifs are partial subgraphs. An induced subgraph must contain all edges between its nodes that are present in the large network, while a partial subgraph may contain only some of these edges. Moreover, graphlets do not need to be over-represented in the data when compared with randomized networks, while motifs do.
Graphlets were first introduced by Nataša Pržulj, when they were used as a basis for designing two new highly sensitive measures of network local structural similarities: the relative graphlet frequency distance and the graphlet degree distribution agreement. Additionally, Pržulj group developed a novel measure of network topological similarity that generalizes the degree of a node in the network to its graphlet degree vector or graphlet degree signature.

Graphlet-based network properties

Relative graphlet frequency distance

RGF-distance compares the frequencies of the appearance of all 3-5-node graphlets in two networks. Let N_i be the number of graphlets of type in network G, and let be the total number of graphlets of G. The "similarity" between two graphs should be independent of the total number of nodes or edges, and should depend only upon the differences between relative frequencies of graphlets. Thus, relative graphlet frequency distance D between two graphs G and H is defined as:

,

where. The logarithm of the graphlet frequency is used because frequencies of different graphlets can differ by several orders of magnitude and the distance measure should not be entirely dominated by the most frequent graphlets.

Graphlet degree distribution agreement

GDD-agreement generalizes the notion of the degree distribution to the spectrum of graphlet degree distributions in the following way. The degree distribution measures the number of nodes of degree k in graph G, i.e., the number of nodes "touching" k edges, for each value of k. Note that an edge is the only graphlet with two nodes. GDDs generalize the degree distribution to other graphlets: they measure for each 2-5-node graphlet G_i,, such as a triangle or a square, the number of nodes "touching" k graphlets G_i at a particular node. A node at which a graphlet is "touched" is topologically relevant, since it allows us to distinguish between nodes "touching", for example, a three node path at an end node or at the middle node. This is summarized by automorphism orbits : by taking into account the "symmetries" between nodes of a graphlet, there are 73 different orbits across all 2-5-node graphlets.
For each orbit j, one needs to measure the j^th GDD, d_G^j, i.e., the distribution of the number of nodes in G "touching" the corresponding graphlet at orbit j k times. Clearly, the degree distribution is the 0th GDD. d_G^j is scaled as
to decrease the contribution of larger degrees in a GDD and then normalized with respect to its total area
giving the "normalized distribution"
The j^th GDD-agreement compares the j^th GDDs of two networks.
For two networks G and H and a particular orbit j, the "distance" D^j between their normalized j^th GDDs is:

The distance is between 0 and 1, where 0 means that G and H have identical j^th GDDs, and 1 means that their j^th GDDs
are far away. Next, D^j is reversed to obtain the j^th GDD-agreement:

, for.
The total GDD-agreement between two networks G and H is the arithmetic or the geometric average of the j^th GDD-agreements over all j, i.e.,

,

and

,

respectively. GDD-agreement is scaled to always be between 0 and 1, where 1 means that two networks are identical with respect to this property.

Graphlet degree vectors (signatures) and signature similarities

This method generalizes the degree of a node, which counts the number of edges that the node touches, into the vector of graphlet degrees, or graphlet degree signature, counting the number of graphlets that the node touches at a particular orbit, for all graphlets on 2 to 5 nodes. The resulting vector of 73 coordinates is the signature of a node that describes the topology of node's neighborhood and captures its interconnectivities out to a distance of 4. The graphlet degree signature of a node provides a highly constraining measure of local topology in its vicinity and comparing the signatures of two nodes provides a highly constraining measure of local topological similarity between them.
The signature similarity is computed as follows. For a node u in graph G, u_i denotes the i^th coordinate of its signature vector, i.e., u_i is the number of times node u is touched by an orbit i in G. The distance D_i between the i^th orbits of nodes u and v is defined as:

,

where w_i is the weight of orbit i that accounts for dependencies between orbits. The total distance D between nodes u and v is defined as:

.

The distance D, where distance 0 means that signatures of nodes u and v are identical. Finally, the signature similarity, S, between nodes u and v'' is:

.

Clearly, a higher signature similarity between two nodes corresponds to a higher topological similarity between their extended neighborhoods.

Application of graphlet-based network properties

RGF-distance and GDD-agreement were used to evaluate the fit of various network models to real-world networks and to discover a new, well-fitting, geometric random graph model for protein-protein interaction networks, as well as other types of biological networks, such as protein structure networks, also called residue interaction graphs. These graphlet-based network properties are implemented in , a software tool for large network analyses and modeling. Alternatively, a parallel implementation is provided in , a software library for computing graphlet-based network properties in large and massive networks.
Graphlet degree vectors and signature similarities were applied to biological networks to identify groups of topologically similar nodes in a network and predict biological properties of yet uncharacterized nodes based on known biological properties of characterized nodes. Specifically, they were applied to protein function prediction, cancer gene identification, and discovery of pathways underlying certain biological processes, such as melanogenesis or protein degradation. Additionally, GRAph ALigner, a global network alignment method, used graphlet degree vectors and signature similarities to produce topological alignments of biological networks, without using any information external to network topology.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...