PICRUSt

PICRUSt
is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.
The tool serves in the field of metagenomic analysis where it allows inference of the functional profile of a microbial
community based on marker gene survey along one or more samples. In essence, PICRUSt takes a user supplied operational taxonomic unit table, representing the marker gene sequences accompanied with its relative abundance in each of the samples. The output of PICRUSt is a sample by functional-gene-count matrix, telling the count of each functional-gene in each of the samples surveyed. The ability of PICRUSt to estimate the functional-gene profile for a given sample relies on a set of known sequenced genomes. This could also be thought of as an automated alternative to manually researching the gene families likely to be present in organisms whose sequences are found in a 16S ribosomal RNA amplicon library. The below description corresponds to the original version of PICRUSt, but a major update to this tool is currently being developed.

Genome prediction algorithm

In an initial preprocessing phase, PICRUSt constructs confidence intervals and point predictions for the number of copies of each gene family in each bacterial and archaeal strain in a reference tree, using organisms with sequenced genomes as a reference. More specifically, for each gene family, PICRUSt maps known gene copy numbers onto a reference tree of life. These gene family copy numbers are treated as continuous traits, and an evolutionary model constructed under the assumption of Brownian Motion. These evolutionary models can be constructed with either Maximum Likelihood, Relaxed Maximum Likelihood or Wagner Parsimony This evolutionary model is then used to predict both a point estimate and a confidence interval for the copy number of microorganisms without sequenced genomes. This 'genome prediction' step produces a large table of bacterial types vs. gene family copy numbers. This table is distributed to end users. It is important to note that this prediction method is not the same as a nearest neighbor approach, and was shown to give a small but significant improvement in accuracy over that strategy. However, nearest neighbor prediction is available as an option in PICRUSt.
Notably, while this functionality is typically used for prediction of gene copy numbers in bacteria, it could, in principle, be used for prediction of any other continuous trait given trait data for diverse organisms and a reference phylogeny.
Langille et al tested the accuracy of this genome prediction step using leave-one-out cross validation on the input set of sequenced genomes. Additional tests examined sensitivity to errors in phylogenetic inference, lack of genomic data, and the accuracy of the confidence intervals on gene content.
A similar step predicts the copy number of 16S rRNA genes.

Metagenome prediction algorithm

When applying PICRUSt to a 16S rRNA gene library, PICRUSt matches reference operational taxonomic units against the tables, and retrieves a predicted 16S rRNA copy number and gene copy number for each gene family. The abundance of each OTU is divided by its predicted copy number, and then multiplied by the copy number of the gene family. This gives a prediction for the contribution of each OTU to the overall gene content of the sample. Finally, these individual contributions are summed together to produce an estimate of the genes present in the metagenome.
Langille et al., 2013 tested the accuracy of this genome prediction step by using previously reported datasets in which the same biological sample was subjected to 16S rRNA gene amplification and shotgun metagenomics. In these cases, the shotgun metagenomic results were taken as a representation of the 'true' community, and the 16S rRNA gene amplicon libraries fed into PICRUSt to attempt to predict those data. Test datasets included human microbiome samples from the Human Microbiome Project, soil samples, diverse mammalian samples, and samples from the Guerrero Negro microbial mats

The Nearest Sequenced Taxon Index

Because PICRUSt, and evolutionary comparative genomics in general, depends on sequenced genomes, biological samples from well-studied environments will be better predicted than poorly studied environments. In order to assess how many genomes are available, PICRUSt optionally allows users to calculate a Nearest Sequenced Taxon Index for their samples. This index reflects the average phylogenetic distance between each 16S rRNA gene sequence in their sample, and a 16S rRNA gene sequence from a fully sequenced genome. In general, the lower the NSTI score, the more accurate PICRUSt's predictions are expected to be. For example, showed that PICRUSt was much more accurate on diverse soil samples and samples from the Human Microbiome Project than on microbial mat samples from Guerrero Negro, which contained many bacteria without any sequenced relatives.

Related tools

Okuda et al., 2012 published a similar method that used a bounded k-Nearest Neighbor approach to predict virtual metagenomes. They validated their approach using 16S rRNA gene sequences extracted from shotgun metagenomes, and compared the predictions of their method against the full metagenome.
CopyRighter, like PICRUSt, uses evolutionary modeling and phylogenetic trait prediction to estimate 16S rRNA gene sequence copy numbers for each bacterial and archaeal type in a sample, and then uses these estimates to correct estimates of community composition.
PanFP presented a similar method, but based on genome predictions for each taxonomic group. Benchmarking showed highly similar performance to PICRUSt when compared on the same datasets. One advantage is that all OTUs, not just those in a reference phylogeny table can be used. One disadvantage is that confidence intervals and evolutionary models are not constructed.
PAPRICA is a metagenome prediction tool based on placing input 16S rRNA gene sequences into a known phylogenetic tree based corresponding to reference genomes. The main prediction output corresponds to Enzyme Commission numbers.
Piphillin is a tool produced by the company Second Genome that produces metagenome predictions based on nearest-neighbour clustering of input 16S rRNA gene sequences with 16S rRNA gene sequences from reference genomes. There is a web portal for running this tool on the Second Genome website. This tool is under continual development and undergoing validation as summarized in a 2020 publication.
Tax4Fun is a similar tool based on linking the 16S ribosomal RNA genes from all KEGG organisms with 16S rRNA gene sequences found in the SILVA ribosomal RNA database. Originally this tool was restricted to 16S rRNA gene sequences found within the SILVA database. However, the latest version of this tool, Tax4Fun2, can be used with OTUs or amplicon sequence variants from any clustering pipeline.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...