Rat Genome Database
The Rat Genome Database is the premiere location for rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci, and for consolidating rat strain data and making it available to the research community. RGD is working with groups such as the Programs for Genomic Applications at MCW and the National BioResource Project for the Rat in Japan to collect and make available comprehensive physiologic data for a variety of rat strains. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse and human.
RGD began as a collaborative effort between leading research institutions involved in rat genetic and genomic research. Its goal, as stated in RFA: HL-99-013, was the establishment of a Rat Genome Database to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research efforts and make these data widely available to the scientific community. A secondary, but critical goal was to provide curation of mapped positions for quantitative trait loci, known mutations and other phenotypic data.
The rat continues to be extensively used by researchers as a model organism for investigating pharmacology, toxicology, general physiology and the biology and pathophysiology of disease. In recent years, there has been a rapid increase in rat genetic and genomic data. In addition to this, the Rat Genome Database has become a central point for information on the rat for research and now features information on not just genetics and genomics, but physiology and molecular biology as well. There are tools and data pages available for all of these fields that are curated by RGD staff.
Data
RGD's Data page lists eight types of data stored in the database: Genes, QTLs, Markers, Maps, Strains, Ontologies, Sequences and References. Of these, six are actively used and regularly updated. The RGD "Maps" datatype refers to legacy genetic and radiation hybrid maps. This data has been largely supplanted by the rat whole genome sequence. The "Sequences" data type is not a full list of either genomic, transcript or protein sequences, but rather mostly contains PCR primer sequences which define simple sequence length polymorphism and expressed sequence tag Markers. Such sequences are useful primarily for researchers still using these markers for genotyping their animals and for distinguishing between markers of the same name. The six major data types in RGD are as follows:- Genes: Initial gene records are imported/updated from the National Center for Biotechnology Information's Gene database on a weekly basis. Data imported during this process includes the Gene ID, Genbank/RefSeq nucleotide and protein sequence identifiers, HomoloGene group IDs and Ensembl Gene, Transcript and Protein IDs. Additional protein-related data is imported from the UniProtKB database. RGD curators review the literature and manually curate GO, diseases, phenotypes and pathways for rat genes, diseases and pathways for mouse genes, and diseases, phenotypes and pathways for human genes. In addition, the site imports GO annotations for mouse and human genes from the GO Consortium, rat electronic annotations from UniProt and mouse phenotype annotations from the Mouse Genome Database/Mouse Genome Informatics.
- QTLs: RGD's staff manually curates data for rat and human QTLs from the literature where such publications exist or from records directly submitted by researchers. Mouse QTL records, including Mammalian Phenotype ontology assignments, are imported directly from MGI. For rat and human QTLs, curation includes assigning MP and disease ontology annotations. QTL positions are automatically assigned based on the genomic positions of peak and/or flanking markers or single nucleotide polymorphisms. QTL records link to information about related strains, candidate genes, associated markers and related QTLs.
- Strains: As for QTL records, RGD strain records are either manually curated from the literature or submitted by researchers. Strain records include information about the origin and availability of the strain, associated phenotypes, whether the strain is a model for a human disease, and any information that is available about breeding, behavior, husbandry, etc. Strain records link to information about related genes and QTLs, associated strains and, where available, strain-specific nucleotide variants. For congenic and mutant strains, genomic positions are assigned for the introgressed region or the location of the mutated sequence. RGD does not import data for mouse strains.
- Markers: Because genetic markers such as SSLPs and ESTs have been, and continue to be, used for QTLs and strains, RGD stores marker data for rat, human and mouse. Marker data includes the sequences of the associated forward and reverse PCR primers, genomic positions and links to NCBI's Probe database. Marker records link to associated QTL, strain and gene records.
- Ontologies: In order to make RGD's data both human readable and available for computational analysis and retrieval, RGD relies on the use of multiple ontologies. As of July 2015, RGD used 16 different ontologies to express the various types of data applicable to RGD's diverse datatypes. Ontology annotations are assigned manually by curators or are imported from external sources through the use of automated pipelines. Six of the ontologies in use at RGD were created or co-created at RGD and seven are under development by RGD staff members and/or collaborators, these being ontologies for Pathway, Rat Strains, Vertebrate Traits, Disease, Clinical Measurements, Measurement Methods and Experimental Conditions. Ontologies which are imported from outside sources are updated weekly.
- References: RGD references are scientific publications that have been used for curation or are sources for data objects such as QTLs and strains. For references accessed via NCBI's PubMed, imported data includes the title, authors, citation and PubMed ID. In some cases, a reference is an internal record for processes such as automated pipelines or a personal communication, giving users of the database an indication of the source of a particular piece of data. PubMed records are not available for these. Each reference record links to all of the data curated from that article, including genes, QTLs, strains and ontology annotations.
Genome tools
Genome tools developed at RGD
RGD develops web-based tools designed to use the data stored in the RGD database for analyses in rat and across species. These include:- Gene Annotator: The Gene Annotator or GA tool takes as input a list of gene symbols, RGD IDs, GenBank accession numbers, Ensembl identifiers, and/or a chromosomal region and retrieves gene orthologs, external database identifiers and ontology annotations for the corresponding genes in RGD. The data can be downloaded into an Excel spreadsheet or analyzed in the tool. The "Annotation Distribution" function displays a list of terms in each of seven categories with the percentage of genes from the input list with annotations to each term. The "Comparison Heat Map" function allows comparisons of annotations for genes in the input list across two ontologies or across two branches of the same ontology.
- Variant Visualizer: Variant Visualizer is a viewing and analysis tool for rat strain-specific sequence polymorphisms. VV takes as input a list of gene symbols or a genomic region as defined by chromosome, start and stop positions or by two gene or markers symbols. The user must also select their strains of interest from a list of strains for which whole genome sequences exist and can set parameters for the variants in the result set. Output is a heatmap-type display of variants. Additional information for individual variants can be viewed in a "detail pane" display.
- OLGA - Object List Generator & Analyzer: OLGA is a search engine designed to allow users to run multiple queries, generate a list of objects from each query and flexibly combine the results. OLGA takes as input either a list of object symbols or search parameters based on ontology annotations or position. The final list of genes, QTLs or strains can be downloaded or submitted to the GA Tool, the Variant Visualizer or the Genome Viewer from within the tool.
- Genome Viewer: The Genome Viewer tool provides users with complete genome views of genes, QTLs and mapped strains annotated to a function, biological process, cellular component, phenotype, disease, pathway, or chemical interaction. GViewer allows Boolean searches across multiple ontologies. Output is displayed against a karyotype of the rat genome.
- Overgo Probe Designer: probes are pairs of partially overlapping 22mer oligonucleotides derived from repeat-masked genomic sequence and used as high specific activity probes for genome mapping. The Overgo Probe Designer tool takes as input a nucleotide sequence and outputs a list of optimized probe sequences containing the requisite 8 nucleotide overlap on their 3' ends.
- ACP Haplotyper: The ACP Haplotyper creates a "visual haplotype" that can be used to identify conserved and non-conserved chromosomal regions between any of the 48 rat strains characterized as part of the ACP project. For the selected chromosome and between the selected strains, the tool compares the allele size data for microsatellite markers on the selected genetic or RH map.
- SNPlotyper: SNPlotyper is a visualization and analysis tool for Rat SNP data imported from dbSNP and Ensembl. It enables users to view haplotype blocks shared between strains and identify informative markers between two or more strains. Data in SNPlotyper is legacy genotyping data and does not include the strain-specific variants derived from WGS of rat strains.
Third party genome tools adapted for use with RGD data
- Genome Browsers: As of July 2015, RGD supported two types of genome browsers for viewing data for rat, mouse and human. Both tools, GBrowse and JBrowse have been or are being developed by the Generic Model Organism Database. These tools allow the user to view the location of a genetic landmark on the genome of the applicable species. They also allow comparisons between species via the use of "synteny tracks" and links between instances of the browsers for the different species.
- RatMine: RatMine is a rat-centric version of the InterMine software. It enables users to mine and analyze rat data from diverse databases including RGD, NCBI, UniProtKB and Ensembl in a single location using a consistent format. The InterMine platform has been adapted for multiple species in other databases and is designed to be interoperable between instances so that users can query across species from the RatMine interface.
- Virtual Comparative Map: The Virtual Comparative Map was originally developed to explore the syntenic relationships between rat, mouse and human genomes. A new version of VCMap is now available which also incorporates cow, pig and chicken. Users select a "primary" or "backbone" species, then can view the syntenic regions in one or more of the other species.
Additional data and tools
Phenotypes and Models portal
RGD's Phenotypes and Models portal focuses on strains, phenotypes and the rat as a model organism for physiology and disease. The Phenotypes and Models portal has five sections: "Phenotypes", "Strains & Models", "Meet Joe Rat", "PhenoMiner" and "Strain Medical Records".- Phenotypes: The Phenotypes section contains a large body of data from the PhysGen Program for Genomic Applications project, an NHLBI-funded project to "develop consomic and knockout rat strains, phenotypically characterize these strains, and provide these resources to the scientific community.". Data categories include measurements of cardiovascular, renal and respiratory function, blood chemistry, body morphology and behavior. Links are also provided to protocols for phenotyping rats and to similar high-throughput phenotyping data at the National BioResource Project for the Rat in Japan.
- Strains & Models: The Strains and Models section contains general information on rat strains, including information about strain availability and animal husbandry, and links to the RGD strain search and to review articles about rat strains. The section also includes a subsection about disease models that gives detailed information about which rat strains have been used as models for human cardiovascular disease, neurological disease, mammary cancer, diabetes, respiratory diseases, and immune and inflammatory diseases.
- Meet Joe Rat: "Meet Joe Rat" is designed as a general information resource for rat researchers. The Photos and Images pages link to images of PGA/PhysGen parental and consomic strains which in turn link to data for those strains. "Ratday" links to the yearly RGD rat calendar. "Community Submissions" gives information and forms for submitting photos, for registering strains and for submitting quantitative phenotype data for PhenoMiner. The final subsection contains information about strain availability.
- PhenoMiner: PhenoMiner is a database and web application for finding and analyzing quantitative rat phenotype data. Data is annotated to ontologies for rat strain, clinical measurement, measurement method, and experimental condition. Experiments are categorized by the trait or disease assessed by the measurement. The use of standardized vocabularies and data formats allows comparison of values across experiments for the same measurement. The PhenoMiner results page includes a graph of the measurement values and a downloadable table of the values with their accompanying metadata. A link is provided to give users the opportunity to submit their own data to the database.
- Strain Medical Records: RGD's Strain Medical Records are designed to consolidate what is known about a particular strain. Information such as coat coloring, average body weights at various time points for both male and female, and information about reproduction is presented. Average values for quantitative phenotype measurements such as blood pressure, heart rate and blood chemistry for rats of that strain under standard/control conditions are given along with the corresponding range of values for other commonly used strains. Each SMR links to source where the strain can be obtained, to PhenoMiner for the quantitative phenotype data and to variant, QTL and microarray expression data.
Diseases
- Cancer
- Cardiovascular Disease
- Diabetes
- Immune and Inflammatory Disease
- Neurological Disease
- Obesity and Metabolic Syndrome
- Renal Disease
- Respiratory Disease
- Sensory Organ Disease