sangerGene WormBase Genes WormBase Gene Annotations Genes and Gene Predictions Description The WormBase Genes track shows Sanger Gene predictions from the Wormbase v. WS120 files downloaded from the Sanger Institute FTP site. This track shows the subset of data annotated with "curated", "DNA", or "RNA" followed by one of these strings: "intron", "exon", "cds", "sequence", or "transcri". Credits The Sanger gene predictions are produced by WormBase. Thanks to the WormBase Consortium for providing these data. sangerGenefinder WormBase Genefinder WormBase Genefinder Gene Predictions Genes and Gene Predictions Description The WormBase Genefinder track shows Sanger Gene predictions from the Wormbase v. WS120 files downloaded from the Sanger Institute FTP site. This track shows the subset of data annotated with the string "Genefinder" and found by the gene prediction program, Genefinder. Credits The Sanger gene predictions are produced by WormBase. Thanks to the WormBase Consortium for providing these data. refGene RefSeq Genes RefSeq Genes Genes and Gene Predictions Description The RefSeq Genes track shows known C. elegans protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. This page is accessed via the small button to the left of the track's graphical display or through the link on the track's control menu. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods RefSeq RNAs were aligned against the C. elegans genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 intronEst Spliced ESTs C. elegans ESTs That Have Been Spliced mRNA and EST Description This track shows alignments between C. elegans expressed sequence tags (ESTs) in GenBank and the genome that show signs of splicing when aligned against the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. To be considered spliced, an EST must show evidence of at least one canonical intron, i.e. one that is at least 32 bases in length and has GT/AG ends. By requiring splicing, the level of contamination in the EST databases is drastically reduced at the expense of eliminating many genuine 3' ESTs. For a display of all ESTs (including unspliced), see the C. elegans EST track. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, darker shading indicates a larger number of aligned ESTs. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, C. elegans ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence are displayed in this track. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 picTar PicTar miRNA MicroRNA target sites in 3' UTRs as predicted by PicTar Expression and Regulation Description This track shows microRNA target sites in 3' UTRs as predicted by PicTar, based on the 3' UTRs annotated by WormBase and on UCSC's 3' UTRs predictions (see below). Methods The original PicTar algorithm was published in Krek et al., 2005. The annotations displayed in this track are updated predictions as published in Lall et al., 2006. PicTar is a hidden Markov model that assigns probabilities to 3' UTR subsequences as a binding site for a microRNA, considers all possible ways the 3' UTR could be bound by microRNAs, and then uses a maximum likelihood method to compute the optimal likelihood under which the 3' UTR could be explained by microRNAs and background. The score is this likelihood divided by background, i.e., the local base composition of each 3' UTR is taken into account. To fit the track conventions of the UCSC browser (integers), all scores were scaled by the maximum score of all microRNA 3'-UTR scores observed. Note that the PicTar algorithm scores any 3' UTR that has at least one aligned conserved predicted binding site for a microRNA, but then incorporates all possible binding sites into the score, even if they appear to be non-conserved. Because the score for a 3' UTR is a "phylo" average over all orthologous 3' UTRs used, "scattered" sites that appear in many species may boost the score, and individual sites shown in the display may not be aligned and conserved in all species under consideration. For the nematode target predictions, three species were used: C. elegans, C. briggsae and C. remanei. Because 3' UTRs are rarely annotated in WormBase or RefSeq, UCSC "guessed" the 3' UTR sequences for all transcripts without 3' UTRs annotations by extending the CDS 500 bases downstream of the stop codon, and stopping when another CDS was hit. For a detailed analysis of signal-to-noise ratios and sensitivity, please refer to Lall et al., 2006. Credits Thanks to the Dominic Grün, Yi-Lu Wang, and Nikolaus Rajewsky for providing this annotation. More detailed information about individual predictions, including links to other databases, can be found on the PicTar website, a project of the Rajewsky lab while at the New York University Center for Comparative Functional Genomics. References Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M et al. Combinatorial microRNA target predictions. Nat Genet. 2005 May;37(5):495-500. PMID: 15806104 Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW et al. A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol. 2006 Mar 7;16(5):460-71. PMID: 16458514 gold Assembly Assembly from Fragments Mapping and Sequencing Description This track shows the draft assembly of the C. elegans genome. This assembly merges contigs from overlapping drafts and finished clones into longer sequence contigs. The sequence contigs are ordered and oriented when possible by mRNA, EST, paired plasmid reads (from the SNP Consortium) and BAC end sequence pairs. In the WS120 assembly, all the clones are finished and are of clone type "F". In dense mode, this track depicts the path through the finished clones (aka the golden path) used to create the assembled sequence. Clone boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist in the path, spaces are shown between the gold and brown blocks. If the relative order and orientation of the contigs between the two blocks is known, a line is drawn to bridge the blocks. There are only single base gaps in this assembly. augustusGene AUGUSTUS AUGUSTUS ab initio gene predictions v3.1 Genes and Gene Predictions Description This track shows ab initio predictions from the program AUGUSTUS (version 3.1). The predictions are based on the genome sequence alone. For more information on the different gene tracks, see our Genes FAQ. Methods Statistical signal models were built for splice sites, branch-point patterns, translation start sites, and the poly-A signal. Furthermore, models were built for the sequence content of protein-coding and non-coding regions as well as for the length distributions of different exon and intron types. Detailed descriptions of most of these different models can be found in Mario Stanke's dissertation. This track shows the most likely gene structure according to a Semi-Markov Conditional Random Field model. Alternative splicing transcripts were obtained with a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2). The different models used by Augustus were trained on a number of different species-specific gene sets, which included 1000-2000 training gene structures. The --species option allows one to choose the species used for training the models. Different training species were used for the --species option when generating these predictions for different groups of assemblies. Assembly Group Training Species Fish zebrafish Birds chicken Human and all other vertebrates human Nematodes caenorhabditis Drosophila fly A. mellifera honeybee1 A. gambiae culex S. cerevisiae saccharomyces This table describes which training species was used for a particular group of assemblies. When available, the closest related training species was used. Credits Thanks to the Stanke lab for providing the AUGUSTUS program. The training for the chicken version was done by Stefanie König and the training for the human and zebrafish versions was done by Mario Stanke. References Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. PMID: 14534192 est C. elegans ESTs C. elegans ESTs Including Unspliced mRNA and EST Description This track shows alignments between C. elegans expressed sequence tags (ESTs) in GenBank and the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, C. elegans ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 mrna C. elegans mRNAs C. elegans mRNAs from GenBank mRNA and EST Description The mRNA track shows alignments between C. elegans mRNAs in GenBank and the genome. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Methods GenBank C. elegans mRNAs were aligned against the genome using the blat program. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 c_briggsae_pwMaf Conservation Blastz Alignments with C. Briggsae & Conservation Comparative Genomics Description This track shows a measure of evolutionary conservation in C. elegans and C. briggsae based on a phylogenetic hidden Markov model (phastCons). Blastz alignments of the March 2004 assembly of C. elegans (ce2) and the July 2002 assembly of C. briggsae (cb1) were used to generate this annotation. In full display mode, this track shows the phylogenetic conservation score as well as a summary of the pairwise alignments (shown in dense display mode using a grayscale density gradient). The checkboxes in the track configuration section allow the exclusion of species from the pairwise display; however, this does not remove them from the conservation score display. When zoomed-in to the base-display level, the track shows the base composition of each alignment. The numbers and symbols on the "Gaps" line indicate the lengths of gaps in the C. elegans sequence at those alignment positions relative to C. briggsae. If there is sufficient space in the display, the size of the gap is shown; if not, and if the gap size is a multiple of 3, a "*" is displayed, otherwise "+" is shown. To view detailed information about the alignments at a specific position, zoom in the display to 30,000 or fewer bases, then click on the alignment. This track may be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Methods Best-in-genome pairwise alignments were generated using blastz, followed by chaining and netting. These alignments were then assigned conservation scores by phastCons. The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Note that, unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size, so short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. (2005). PhastCons currently treats alignment gaps as missing data, which sometimes has the effect of producing undesirably high conservation scores in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps. Credits This track was created at UCSC using the following programs: Blastz by Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at UCSC. "Wiggle track" plotting software by Hiram Clawson at UCSC. References Phylo-HMM Felsenstein J, Churchill GA. A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol. 1996 Jan;13(1):93-104. PMID: 8583911 Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In: Nielsen R, editor. Statistical Methods in Molecular Evolution. New York: Springer; 2005. pp. 325-351. Yang Z. A space-time process model for the evolution of DNA sequences. Genetics. 1995 Feb;139(2):993-1005. PMID: 7713447; PMC: PMC1206396 Chain/Net Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Blastz Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 gap Gap Gap Locations Mapping and Sequencing Description There are no gaps in the WS120 C. elegans assembly. Instead, this track shows the few remaining unknown bases. The bases are distibuted across the genome as follows: Chromosome II: 3 unknown bases Chromosome III: 5 unknown bases gc5Base GC Percent GC Percent in 5-Base Windows Mapping and Sequencing Description The GC percent track shows the percentage of G (guanine) and C (cytosine) bases in 5-base windows. High GC content is typically associated with gene-rich areas. This track may be configured in a variety of ways to highlight different apsects of the displayed information. Click the "Graph configuration help" link for an explanation of the configuration options. Credits The data and presentation of this graph were prepared by Hiram Clawson. rnaCluster Gene Bounds Gene Boundaries as Defined by RNA and Spliced EST Clusters mRNA and EST Description This track shows the boundaries of genes and the direction of transcription as deduced from clustering spliced ESTs and mRNAs against the genome. When many spliced variants of the same gene exist, this track shows the variant that spans the greatest distance in the genome. Method ESTs and mRNAs from GenBank were aligned against the genome using BLAT. Alignments with less than 97.5% base identity within the aligning blocks were filtered out. When multiple alignments occurred, only those alignments with a percentage identity within 0.2% of the best alignment were kept. The following alignments were also discarded: ESTs that aligned without any introns, blocks smaller than 10 bases, and blocks smaller than 130 bases that were not located next to an intron. The orientations of the ESTs and mRNAs were deduced from the GT/AG splice sites at the introns; ESTs and mRNAs with overlapping blocks on the same strand were merged into clusters. Only the extent and orientation of the clusters are shown in this track. Scores for individual gene boundaries were assigned based on the number of cDNA alignments used: 300 — based on a single cDNA alignment 600 — based on two alignments 900 — based on three alignments 1000 — based on four or more alignments Credits This track, which was originally developed by Jim Kent, was generated at UCSC and uses data submitted to GenBank by scientists worldwide. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32:D23-6. Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. geneid Geneid Genes Geneid Gene Predictions Genes and Gene Predictions Description This track shows gene predictions from the geneid program developed by Roderic Guigó's Computational Biology of RNA Processing group which is part of the Centre de Regulació Genòmica (CRG) in Barcelona, Catalunya, Spain. Methods Geneid is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, start and stop codons are predicted and scored along the sequence using Position Weight Arrays (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. Credits Thanks to Computational Biology of RNA Processing for providing these data. References Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr Protoc Bioinformatics. 2007 Jun;Chapter 4:Unit 4.3. PMID: 18428791 Parra G, Blanco E, Guigó R. GeneID in Drosophila. Genome Res. 2000 Apr;10(4):511-5. PMID: 10779490; PMC: PMC310871 blastHg16KG Human Proteins Human Proteins (hg16) Mapped by Chained tBLASTn Genes and Gene Predictions Description This track contains tBLASTn alignments of the peptides from the predicted and known genes identified in the hg16 Known Genes track. Methods First, the predicted proteins from the human Known Genes track were aligned with the human genome using the Blat program to discover exon boundaries. Next, the amino acid sequences that make up each exon were aligned with the C. elegans sequence using the tBLASTn program. Finally, the putative C. elegans exons were chained together using an organism-specific maximum gap size but no gap penalty. The single best exon chains extending over more than 60% of the query protein were included. Exon chains that extended over 60% of the query and matched at least 60% of the protein's amino acids were also included. Credits tBLASTn is part of the NCBI BLAST tool set. For more information on BLAST, see Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-410. Blat was written by Jim Kent. The remaining utilities used to produce this track were written by Jim Kent or Brian Raney. microsat Microsatellite Microsatellites - Di-nucleotide and Tri-nucleotide Repeats Variation and Repeats Description This track displays regions that are likely to be useful as microsatellite markers. These are sequences of at least 15 perfect di-nucleotide and tri-nucleotide repeats and tend to be highly polymorphic in the population. Methods The data shown in this track are a subset of the Simple Repeats track, selecting only those repeats of period 2 and 3, with 100% identity and no indels and with at least 15 copies of the repeat. The Simple Repeats track is created using the Tandem Repeats Finder. For more information about this program, see Benson (1999). Credits Tandem Repeats Finder was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 miRNA miRNA MicroRNAs from miRBase Genes and Gene Predictions Description The miRNA track shows microRNAs from miRBase. Display Conventions and Configuration Mature miRNAs (miRs) are represented by thick blocks. The predicted stem-loop portions of the primary transcripts are indicated by thinner blocks. miRNAs in the sense orientation are shown in black; those in the reverse orientation are colored grey. When a single precursor produces two mature miRs from its 5' and 3' parts, it is displayed twice with the two different positions of the mature miR. To display only those items that exceed a specific unnormalized score, enter a minimum score between 0 and 1000 in the text box at the top of the track description page. Methods Mature and precursor miRNAs from the miRNA Registry were aligned against the genome using blat. The extents of the precursor sequences were not generally known, and were predicted based on base-paired hairpin structure. miRBase is described in Griffiths-Jones, S. et al. (2006). The miRNA Registry is described in Griffiths-Jones, S. (2004) and Weber, M.J. (2005) in the References section below. Credits This track was created by Michel Weber of Laboratoire de Biologie Moléculaire Eucaryote, CNRS Université Paul Sabatier (Toulouse, France), Yves Quentin of Laboratoire de Microbiologie et Génétique Moléculaires (Toulouse, France) and Sam Griffiths-Jones of The Wellcome Trust Sanger Institute (Cambridge, UK). References When making use of these data, please cite: Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D140-4. PMID: 16381832; PMC: PMC1347474 Griffiths-Jones S. The microRNA Registry. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D109-11. PMID: 14681370; PMC: PMC308757 Weber MJ. New human and mouse microRNA genes found by homology search. FEBS J. 2005 Jan;272(1):59-73. PMID: 15634332 The following publication provides guidelines on miRNA annotation: Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X, Dreyfuss G, Eddy SR, Griffiths-Jones S, Marshall M et al. A uniform system for microRNA annotation. RNA. 2003 Mar;9(3):277-9. PMID: 12592000; PMC: PMC1370393 For more information on blat, see Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 phastConsElements Most Conserved PhastCons Conserved Elements (C. briggsae) Comparative Genomics Description This track shows predictions of conserved elements produced by the phastCons program. PhastCons is part of the PHAST (PHylogenetic Analysis with Space/Time models) package. The predictions are based on a phylogenetic hidden Markov model (phylo-HMM), a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. A multiple alignment was then constructed from these pairwise alignments using multiz. Predictions of conserved elements were then obtained by running phastCons on the multiple alignments with the --most-conserved option. PhastCons constructs a two-state phylo-HMM with a state for conserved regions and a state for non-conserved regions. The two states share a single phylogenetic model, except that the branch lengths of the tree associated with the conserved state are multiplied by a constant scaling factor rho (0 <= rho <= 1). The free parameters of the phylo-HMM, including the scaling factor rho, are estimated from the data by maximum likelihood using an EM algorithm. This procedure is subject to certain constraints on the "coverage" of the genome by conserved elements and the "smoothness" of the conservation scores. Details can be found in Siepel et al. (2005). The predicted conserved elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. References PhastCons Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 Chain/Net Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Multiz Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. PMID: 15060014; PMC: PMC383317 Blastz Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 xenoRefGene Other RefSeq Non-C. elegans RefSeq Genes Genes and Gene Predictions Description This track shows known protein-coding and non-protein-coding genes for organisms other than C. elegans, taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods The RNAs were aligned against the C. elegans genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 25% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 simpleRepeat Simple Repeats Simple Tandem Repeats by TRF Variation and Repeats Description This track displays simple tandem repeats (possibly imperfect repeats) located by Tandem Repeats Finder (TRF) which is specialized for this purpose. These repeats can occur within coding regions of genes and may be quite polymorphic. Repeat expansions are sometimes associated with specific diseases. Methods For more information about the TRF program, see Benson (1999). Credits TRF was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 twinscan Twinscan Twinscan Gene Predictions Using briggsae/elegans Homology Genes and Gene Predictions Description The Twinscan program predicts genes in a manner similar to Genscan, except that Twinscan takes advantage of genome comparisons to improve gene prediction accuracy. More information and a web server can be found at http://mblab.wustl.edu/. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The track description page offers the following filter and configuration options: Color track by codons: Select the genomic codons option to color and label each codon in a zoomed-in display to facilitate validation and comparison of gene predictions. Click the Help on codon coloring link on the track description page for more information about this feature. Methods The Twinscan algorithm is described in Korf, I. et al. (2001) in the References section below. Credits Thanks to Michael Brent's Computational Genomics Group at Washington University St. Louis for providing these data. References Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001;17 Suppl 1:S140-8. PMID: 11473003 blatFr1 Fugu Blat Takifugu rubripes (Aug. 2002/fr1) Translated Blat Alignments Comparative Genomics Description This track shows blat translated protein alignments of the Fugu (Aug. 2002 (JGI 3.0/fr1)) genome assembly to the C. elegans genome. The v3.0 Fugu whole genome shotgun assembly was provided by the US DOE Joint Genome Institute (JGI). The strand information (+/-) for this track is in two parts. The first + or - indicates the orientation of the query sequence whose translated protein produced the match. The second + or - indicates the orientation of the matching translated genomic sequence. Because the two orientations of a DNA sequence give different predicted protein sequences, there are four combinations. ++ is not the same as --; nor is +- the same as -+. Methods The alignments were made with blat in translated protein mode requiring two nearby 4-mer matches to trigger a detailed alignment. The C. elegans genome was masked with RepeatMasker and Tandem Repeat Finder before running blat. Credits The 3.0 draft from JGI was used in the UCSC Fugu blat alignments. These data were provided freely by the JGI for use in this publication only. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 wabaCbr Briggsae Waba C. briggsae (July 2002 (WormBase cb25.agp8/cb1)) Waba Alignments Comparative Genomics Description This track displays alignments of C. briggsae to C. elegans using the Wobble Aware Bulk Aligner (WABA), an alignment tool developed by Jim Kent for doing large-scale alignments between genomic DNA of different species. The symbols shown below the base alignments on the details page indicate the hidden Markov model states for those bases: H - highFi L - lowFi Q - qSlip T - tSlip 1 - frame1 2 - frame2 3 - frame3 Methods The WABA alignment algorithm is described in Kent, W.J. and Zahler, A.M. Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res. 2000 Aug;10(8):1115-25. PMID: 10958630 Credits WABA was developed by Jim Kent and Alan M. Zahler. C. briggsae version cb25.agp8 (11 July 2002) and C. elegans version WS120 (01 March 2004) were used in creating this annotation. Thanks to WormBase and Lincoln Stein. The data for the C. elegans browser display were collected by Hiram Clawson. Please contact him with questions or comments about this annotation. blastzCb1 Briggsae Blastz C. briggsae (July 2002 (WormBase cb25.agp8/cb1)) Blastz Comparative Genomics Description This track displays blastz alignments of the C. briggsae assembly (cb1, July 2002 (WormBase cb25.agp8/cb1)) to the C. elegans genome. The track has an optional feature that color codes alignments to indicate the chromosomes from which they are derived in the aligning assembly. To activate the color feature, click the on button next to "Color track based on chromosome" on the track description page. Using the Filter This track has a filter that can be used to change the display mode, turn on the chromosome color track, or filter the display output by chromosome. The filter is located at the top of the track description page, which is accessed via the small button to the left of the track's graphical display or through the link on the track's control menu. Color track: To display the chromosome color track, click the on button next to "Color track based on chromosome". When the color track is activated, each of the items within the annotation track will be colored to show the chromosome in the aligning genome assembly from which the alignment originated. Chromosome filter: To display only alignments from a specific chromomsome in the aligning assembly, type the chromosome number (in the form chrN) in the text box to the right of "Filter by chromosome". For example, to display alignments from chromosome 6, type "chr6". When you have finished configuring the filter, click the Submit button. Credits These alignments were contributed by Scott Schwartz of the Penn State Bioinformatics Group. The best-in-genome filtering was done using UCSC's axtBest program. References Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainCb1 Briggsae Chain C. briggsae (July 2002 (WormBase cb25.agp8/cb1)) Chained Alignments Comparative Genomics Description This track shows alignments of C. briggsae (cb1, July 2002 (WormBase cb25.agp8/cb1)) to the C. elegans genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both C. briggsae and C. elegans simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the C. briggsae assembly or an insertion in the C. elegans assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the C. elegans genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Display Conventions and Configuration By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Methods Transposons that have been inserted since the C. briggsae/C. elegans split were removed from the assemblies. The abbreviated genomes were aligned with blastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single C. briggsae chromosome and a single C. elegans chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. Chains scoring below a threshold were discarded; the remaining chains are displayed in this track. Credits Blastz was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains were generated by Robert Baertsch and Jim Kent. References Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 netCb1 Briggsae Net C. briggsae (July 2002 (WormBase cb25.agp8/cb1)) Alignment Net Comparative Genomics Description This track shows the best C. briggsae/C. elegans chain for every part of the C. elegans genome. It is useful for finding orthologous regions and for studying genome rearrangement. The C. briggsae sequence used in this annotation is from the July 2002 (WormBase cb25.agp8/cb1) (cb1) assembly. Display Conventions and Configuration In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chains were derived from blastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. No lineage-specific repeats were used in the creation of this track. Credits The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. Blastz was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. The browser display and database storage of the nets were made by Robert Baertsch and Jim Kent. References Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 axtNetCb1 Briggsae Aligns C. briggsae (July 2002 (WormBase cb25.agp8/cb1)) Blastz Alignments from Net Comparative Genomics Description This track shows best-in-genome C. elegans/C. briggsae alignments based on the C. elegans/C. briggsae alignment net, i.e. the best chain for every part of the C. elegans genome. The C. briggsae sequence is from WormBase Version cb25.agp8 (July 2002) assembly. Display Conventions and Configuration The track displays alignment scores and, when zoomed in to the base level, the alignment itself. In the graphical display, the shade (in dense display mode) or height (in full mode) of a vertical line indicates the score of the best chained and netted alignment at that location. Clicking on the track brings up detailed information about the blastz alignments (taken from the axt-formatted output) in the currently displayed range of positions. Methods Chains were derived from blastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. The program netToAxt translates the net back to axt. netToAxt is run with -maxGap=300. Credits The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. Blastz was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his program RepeatMasker. The browser display and database storage of the nets were made by Robert Baertsch and Jim Kent. References Kent, WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 rmsk RepeatMasker Repeating Elements by RepeatMasker Variation and Repeats Description This track was created by using Arian Smit's RepeatMasker program which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track) as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka, J. (2000) in the References section below. Display Conventions and Configuration In full display mode, this track displays up to ten different classes of repeats: Short interspersed nuclear elements (SINE), which include ALUs Long interspersed nuclear elements (LINE) Long terminal repeat elements (LTR) which include retroposons DNA repeat elements (DNA) Simple repeats (micro-satellites) Low complexity repeats Satellite repeats RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA) Other repeats which includes class RC (Rolling Circle) Unknown The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading. Methods UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet. Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Repbase Update is described in Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. chainSelf Self Chain C. elegans Chained Self Alignments Variation and Repeats Description This track shows alignments of the C. elegans genome with itself, using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. The system can also tolerate gaps in both sets of sequence simultaneously. After filtering out the "trivial" alignments produced when identical locations of the genome map to one another (e.g. chrN mapping to chrN), the remaining alignments point out areas of duplication within the C. elegans genome. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the query assembly or an insertion in the target assembly. Double lines represent more complex gaps that involve substantial sequence in both the query and target assemblies. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one of the assemblies. In cases where multiple chains align over a particular region of the C. elegans genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Display Conventions and Configuration By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Methods The genome was aligned to itself using blastz. Trivial alignments were filtered out, and the remaining alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single target chromosome and a single query chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. Chains scoring below a threshold were discarded; the remaining chains are displayed in this track. Credits Blastz was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains were generated by Robert Baertsch and Jim Kent. References Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961