Genome survey sequence

In the fields of bioinformatics and computational biology, Genome Survey Sequences are nucleotide sequences similar to EST's that the only difference is that most of them are genomic in origin, rather than mRNA.
Genome Survey Sequences are typically generated and submitted to NCBI by labs performing genome sequencing and are used, amongst other things, as a framework for the mapping and sequencing of genome size pieces included in the standard GenBank divisions.

Contributions

Genome survey sequencing is a new way to map the genome sequences since it is not dependent on mRNA. Current genome sequencing approaches are mostly high-throughput shotgun methods, and GSS is often used on the first step of sequencing. GSSs can provide an initial global view of a genome, which includes both coding and non-coding DNA and contain repetitive section of the genome unlike ESTs. For the estimation of repetitive sequences, GSS plays an important role in the early assessment of a sequencing project since these data can affect the assessment of sequences coverage, library quality and the construction process. For example, in the estimation of dog genome, it can estimate the global parameters, such as neutral mutation rate and repeat content.
GSS is also an effective way to large-scale and rapidly characterizing genomes of related species where there is only little gene sequences or maps. GSS with low coverage can generate abundant information of gene content and putative regulatory elements of comparative species. It can compare these genes of related species to find out relatively expanded or contracted families. And combined with physical clone coverage, researchers can navigate the genome easily and characterize the specific genomic section by more extensive sequencing.

Limitation

The limitation of genomic survey sequence is that it lacks long-range continuity because of its fragmentary nature, which makes it harder to forecast gene and marker order. For example, to detect repetitive sequences in GSS data, it may not be possible to find out all the repeats since the repetitive genome may be longer than the reads, which is difficult to recognize.

Types of data

The GSS division contains the following types of data:

Random "single pass read" genome survey sequences

Random “single pass read” genome survey sequences is GSSs that generated along single pass read by random selection. Single-pass sequencing with lower fidelity can be used on the rapid accumulation of genomic data but with a lower accuracy. It includes RAPD, RFLP, AFLP and so on.

Cosmid/BAC/YAC end sequences

Cosmid/BAC/YAC end sequences use Cosmid/Bacterial artificial chromosome/Yeast artificial chromosome to sequence the genome from the end side. These sequences act like very low copy plasmids that there is only one copy per cell sometimes. To get enough chromosome, they need a large number of E. coli culture that 2.5 - 5 litres may be a reasonable amount.
Cosmid/BAC/YAC can also be used to get bigger clone of DNA fragment than vectors like plasmid and phagemid. A larger insert is often helpful for the sequence project in organizing clones.
Eukaryotic proteins can be expressed by using YAC with posttranslational modification.
BAC can’t do that, but BACs can reliably represent human DNA much better than YAC or cosmid.

[Exon] trapped genomic sequences

Exon trapped sequence is used to identify genes in cloned DNA, and this is achieved by recognizing and trapping carrier containing exon sequence of DNA. Exon trapping has two main features: First, it is independent of availability of the RNA expressing target DNA. Second, isolated sequences can be derived directly from clone without knowing tissues expressing the gene which needs to be identified. During slicing, exon can be remained in mRNA and information carried by exon can be contained in the protein. Since fragment of DNA can be inserted into sequences, if an exon is inserted into intron, the transcript will be longer than usual and this transcript can be trapped by analysis.

Alu PCR">Polymerase chain reaction">PCR sequences

Alu repetitive element is member of Short Interspersed Elements in mammalian genome. There are about 300 to 500 thousand copies of Alu repetitive element in human genome, which means one Alu element exists in 4 to 6 kb averagely. Alu elements are distributed widely in mammalian genome, and repeatability is one of the characteristics, that is why it is called Alu repetitive element. By using special Alu sequence as target locus, specific human DNA can be obtained from clone of TAC, BAC, PAC or human-mouse cell hybrid.
PCR is an approach used to clone a small piece of fragment of DNA. The fragment could be one gene or just a part of gene. PCR can only clone very small fragment of DNA, which generally does not exceed 10kbp.
Alu PCR is a "DNA fingerprinting" technique. This approach is rapid and easy to use. It is obtained from analysis of many genomic loci flanked by Alu repetitive elements, which are non-autonomous retrotransposons present in high number of copies in primate genomes. Alu element can be used for genome fingerprinting based on PCR, which is also called Alu PCR.

Transposon-tagged">Transposon tagging">Transposon-tagged sequences

There are several ways to analyze the function of a particular gene sequence, the most direct method is to replace it or cause a mutation and then to analyze the results and effects. There are three method are developed for this purpose: gene replacement, sense and anti-sense suppression, and insertional mutagenesis. Among these methods, insertional mutagenesis was proved to be very good and successful approach.
At first, T-DNA was applied for insertional mutagenesis. However, using transposable element can bring more advantages. Transposable elements were first discovered by Barbara McClintock in maize plants. She identified the first transposable genetic element, which she called the Dissociation locus. The size of transposable element is between 750 and 40000bp. Transposable element can be mainly classified as two classes: One class is very simple, called insertion sequence, the other class is complicated, called transposon. Transposon has one or several characterized genes, which can be easily identified. IS has the gene of transposase.
Transposon can be used as tag for a DNA with a know sequence. Transposon can appear at other locus through transcription or reverse transcription by the effect of nuclease. This appearance of transposon proved that genome is not statistical, but always changing the structure of itself.
There are two advantages by using transposon tagging. First, if a transposon is inserted into a gene sequence, this insertion is single and intact. The intactness can make tagged sequence easily to molecular analysis. The other advantage is that, many transposons can be found eliminated from tagged gene sequence when transposase is analyzed. This provides confirmation that the inserted gene sequence was really tagged by transposon.

Example of GSS file

The following is an example of GSS file that can be submitted to GenBank:


TYPE: GSS
STATUS: New
CONT_NAME: Sikela JM
GSS#: Ayh00001
CLONE: HHC189
SOURCE: ATCC
SOURCE_INHOST: 65128
OTHER_GSS: GSS00093, GSS000101
CITATION: 
Genomic sequences from Human 
brain tissue
SEQ_PRIMER: M13 Forward
P_END: 5'
HIQUAL_START: 1
HIQUAL_STOP: 285
DNA_TYPE: Genomic
CLASS: shotgun
LIBRARY: Hippocampus, Stratagene 
PUBLIC: 
PUT_ID: Actin, gamma, skeletal
COMMENT:
SEQUENCE:
AATCAGCCTGCAAGCAAAAGATAGGAATATTCACCTACAGTGGGCACCTCCTTAAGAAGCTG
ATAGCTTGTTACACAGTAATTAGATTGAAGATAATGGACACGAAACATATTCCGGGATTAAA
CATTCTTGTCAAGAAAGGGGGAGAGAAGTCTGTTGTGCAAGTTTCAAAGAAAAAGGGTACCA
GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT
GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT
TGTTAGGAAATGGCAAAGTATTGATGATTGTGTGCTATGTGATTGGTGCTAGATACTTTAAC
TGAGTATACGAGTGAAATACTTGAGACTCGTGTCACTT

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...