cons17way Conservation Multiz Alignment & Conservation (17 Species) Comparative Genomics Description This track shows a measure of evolutionary conservation in 17 vertebrates, including mammalian, amphibian, bird, and fish species, based on a phylogenetic hidden Markov model (phastCons). Multiz alignments of the following assemblies were used to generate this annotation: mouse (Aug. 2005 (NCBI35/mm7), mm7) rat (Jun 2003, rn3) rabbit (May 2005, oryCun1) human (May 2004, hg17) chimp (Nov 2003, panTro1) macaque (rheMac1) dog (May 2005, canFam2) cow (Mar 2005, bosTau2) armadillo (May 2005, dasNov1) elephant (May 2005, loxAfr1) tenrec (Jul 2005, echTel1) opossum (Jun 2005, monDom2) chicken (Feb 2004, galGal2) frog (Oct 2004, xenTro1) zebrafish (May 2005, danRer3) Tetraodon (Feb 2004, tetNig1) Fugu (Aug 2002, fr1) Display Conventions and Configuration In full display mode, this track shows the overall conservation score across all species as well as pairwise alignments of each species aligned to the mouse genome. The pairwise alignments are shown in dense display mode using a grayscale density gradient. Checkboxes in the track configuration section allow the exclusion of species from the pairwise display; however, this does not remove them from the conservation score display. When zoomed-in to the base-display level, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the mouse sequence at those alignment positions relative to the longest non-mouse sequence. If there is sufficient space in the display, the size of the gap is shown; if not, and if the gap size is a multiple of 3, a "*" is displayed, otherwise "+" is shown. To view detailed information about the alignments at a specific position, zoom in the display to 30,000 or fewer bases, then click on the alignment. This track may be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. The pairwise alignments were then multiply aligned using multiz, beginning with mouse-rat and subsequently adding in the other species as diagrammed above. The resulting multiple alignments were then assigned conservation scores by phastCons. The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Note that, unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size, so short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. (2005). PhastCons currently treats alignment gaps as missing data, which sometimes has the effect of producing undesirably high conservation scores in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps. Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. "Wiggle track" plotting software by Hiram Clawson at UCSC. The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community. References Phylo-HMMs and phastCons: Felsenstein J, Churchill GA. A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol. 1996 Jan;13(1):93-104. PMID: 8583911 Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In: Nielsen R, editor. Statistical Methods in Molecular Evolution. New York: Springer; 2005. pp. 325-351. Yang Z. A space-time process model for the evolution of DNA sequences. Genetics. 1995 Feb;139(2):993-1005. PMID: 7713447; PMC: PMC1206396 Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. PMID: 15060014; PMC: PMC383317 Blastz: Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 Phylogenetic Tree: Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. PMID: 11743200 cons17wayViewalign Multiz Alignments Multiz Alignment & Conservation (17 Species) Comparative Genomics multiz17way Multiz Align Multiz Alignments of 17 Species Comparative Genomics Description This track shows a measure of evolutionary conservation in 17 vertebrates, including mammalian, amphibian, bird, and fish species, based on a phylogenetic hidden Markov model (phastCons). Multiz alignments of the following assemblies were used to generate this annotation: mouse (Aug. 2005 (NCBI35/mm7), mm7) rat (Jun 2003, rn3) rabbit (May 2005, oryCun1) human (May 2004, hg17) chimp (Nov 2003, panTro1) macaque (rheMac1) dog (May 2005, canFam2) cow (Mar 2005, bosTau2) armadillo (May 2005, dasNov1) elephant (May 2005, loxAfr1) tenrec (Jul 2005, echTel1) opossum (Jun 2005, monDom2) chicken (Feb 2004, galGal2) frog (Oct 2004, xenTro1) zebrafish (May 2005, danRer3) Tetraodon (Feb 2004, tetNig1) Fugu (Aug 2002, fr1) Display Conventions and Configuration In full display mode, this track shows the overall conservation score across all species as well as pairwise alignments of each species aligned to the mouse genome. The pairwise alignments are shown in dense display mode using a grayscale density gradient. Checkboxes in the track configuration section allow the exclusion of species from the pairwise display; however, this does not remove them from the conservation score display. When zoomed-in to the base-display level, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the mouse sequence at those alignment positions relative to the longest non-mouse sequence. If there is sufficient space in the display, the size of the gap is shown; if not, and if the gap size is a multiple of 3, a "*" is displayed, otherwise "+" is shown. To view detailed information about the alignments at a specific position, zoom in the display to 30,000 or fewer bases, then click on the alignment. This track may be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. The pairwise alignments were then multiply aligned using multiz, beginning with mouse-rat and subsequently adding in the other species as diagrammed above. The resulting multiple alignments were then assigned conservation scores by phastCons. The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Note that, unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size, so short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. (2005). PhastCons currently treats alignment gaps as missing data, which sometimes has the effect of producing undesirably high conservation scores in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps. Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. "Wiggle track" plotting software by Hiram Clawson at UCSC. The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community. References Phylo-HMMs and phastCons: Felsenstein J, Churchill GA. A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol. 1996 Jan;13(1):93-104. PMID: 8583911 Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In: Nielsen R, editor. Statistical Methods in Molecular Evolution. New York: Springer; 2005. pp. 325-351. Yang Z. A space-time process model for the evolution of DNA sequences. Genetics. 1995 Feb;139(2):993-1005. PMID: 7713447; PMC: PMC1206396 Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. PMID: 15060014; PMC: PMC383317 Blastz: Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 Phylogenetic Tree: Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. PMID: 11743200 cons17wayViewphastcons Element Conservation (phastCons) Multiz Alignment & Conservation (17 Species) Comparative Genomics phastCons17 17 Species Cons 17 Species Conservation by PhastCons Comparative Genomics cons17wayViewelements Conserved Elements Multiz Alignment & Conservation (17 Species) Comparative Genomics phastConsElements 17 Species El 17 Species Conserved Elements Comparative Genomics Description This track shows predictions of conserved elements produced by the phastCons program. PhastCons is part of the PHAST (PHylogenetic Analysis with Space/Time models) package. The predictions are based on a phylogenetic hidden Markov model (phylo-HMM), a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. A multiple alignment was then constructed from these pairwise alignments using multiz. Predictions of conserved elements were then obtained by running phastCons on the multiple alignments with the --most-conserved option. PhastCons constructs a two-state phylo-HMM with a state for conserved regions and a state for non-conserved regions. The two states share a single phylogenetic model, except that the branch lengths of the tree associated with the conserved state are multiplied by a constant scaling factor rho (0 <= rho <= 1). The free parameters of the phylo-HMM, including the scaling factor rho, are estimated from the data by maximum likelihood using an EM algorithm. This procedure is subject to certain constraints on the "coverage" of the genome by conserved elements and the "smoothness" of the conservation scores. Details can be found in Siepel et al. (2005). The predicted conserved elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. References PhastCons: Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. PMID: 15060014; PMC: PMC383317 Blastz: Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 cpgIslandExt CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 cpgIslandSuper CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 knownGene Known Genes UCSC Known Genes (November, 05) Based on UniProt, RefSeq, and GenBank mRNA Genes and Gene Predictions Description The UCSC Known Genes track shows known protein-coding genes based on protein data from UniProt (SWISS-PROT and TrEMBL) and mRNA data from the NCBI reference sequences collection (RefSeq) and GenBank. Each Known Gene is represented by an mRNA and a protein. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks with the following color scheme: Black: indicates the gene has a corresponding entry in the Protein Databank (PDB). Dark Blue: indicates the gene has either a corresponding RefSeq mRNA that is "Reviewed" or "Validated" or a corresponding SWISS-PROT protein. Medium Blue: indicates the gene has a corresponding RefSeq mRNA that is not "Reviewed" nor "Validated". Light Blue: everything else. That is, the gene does not have a corresponding Protein Databank entry, RefSeq mRNA, or SWISS-PROT protein, but it has supporting evidence of a GenBank mRNA with a UniProt (TrEMBL) protein. This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Methods This release of UCSC Known Genes was built by a new process, KG II, as described below. UniProt protein sequences (including alternative splicing isoforms) and mRNA sequences from RefSeq and GenBank were aligned against the base genome using BLAT. RefSeq alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. GenBank mRNA alignments having a base identity level within 0.2% of the best and at least 97% base identity with the genomic sequence were kept. Protein alignments having a base identity level within 0.2% of the best and at least 80% base identity with the genomic sequence were kept. Then the genomic mRNA and protein alignments were compared, and protein-mRNA pairings were determined from their overlaps. mRNA CDS data were obtained from RefSeq and GenBank data and supplemented by CDS structures derived from UCSC protein-mRNA alignments using tblastn. The initial set of UCSC Known Genes candidates consists of all protein-mRNA pairs with valid mRNA CDS structures. A gene-check program (similar to the one used for the Consensus CDS (CCDS) project) is used to remove questionable candidates, such as those with in-frame stop codons, bad frame, missing start or stop codons, etc. From each group of gene candidates that share the same CDS structure, the protein-mRNA pair having the best ranking and protein-mRNA alignment score is selected as a UCSC Known Gene. The ranking of a gene candidate depends on its gene-check quality measures. When all else is equal, a preference is given to RefSeq mRNAs and next to MGC mRNAs. Similarly a preference is given to gene candidates represented by SWISS-PROT proteins. The protein-mRNA alignment score is calculated based on protein to mRNA alignment using TBLASTN, plus weighted sub-scores according to the date and length of the mRNA. Credits The UCSC Known Genes track was produced using protein data from UniProt and mRNA data from NCBI RefSeq and GenBank. Data Use Restrictions The UniProt data have the following terms of use, UniProt copyright(c) 2002 - 2004 UniProt consortium: For non-commercial use, all databases and documents in the UniProt FTP directory may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. For commercial use, all databases and documents in the UniProt FTP directory except the files ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.xml.gz may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. More information for commercial users can be found at the UniProt License & disclaimer page. From January 1, 2005, all databases and documents in the UniProt FTP directory may be copied and redistributed freely by all entities, without advance permission, provided that this copyright statement is reproduced with each copy. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known Genes. Bioinformatics. 2006 May 1;22(9):1036-46. PMID: 16500937 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 mrna Mouse mRNAs Mouse mRNAs from GenBank mRNA and EST Description The mRNA track shows alignments between mouse mRNAs in GenBank and the genome. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods GenBank mouse mRNAs were aligned against the genome using the blat program. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 rmsk RepeatMasker Repeating Elements by RepeatMasker Variation and Repeats Description This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. Some newer assemblies have been made with Dfam, not Repbase. You can find the details for how we make our database data here in our "makeDb/doc/" directory. Display Conventions and Configuration In full display mode, this track displays up to ten different classes of repeats: Short interspersed nuclear elements (SINE), which include ALUs Long interspersed nuclear elements (LINE) Long terminal repeat elements (LTR), which include retroposons DNA repeat elements (DNA) Simple repeats (micro-satellites) Low complexity repeats Satellite repeats RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) Other repeats, which includes class RC (Rolling Circle) Unknown The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading. A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed. Methods Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072 For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616 Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846 stsMapMouseNew STS Markers STS Markers on Genetic and Radiation Hybrid Maps Mapping and Sequencing Description This track shows locations of Sequence Tagged Sites (STS) along the mouse draft assembly. These markers have been mapped using either genetic mapping (WICGR Mouse Genetic Map, MGD Genetic Map) or radiation hybridization mapping (Whitehead/MRC RH Map) techniques. Additional data on the individual maps can be found at the following links: NCBI UniSTS Mouse Genome Informatics (MGI) By default all genetic map markers are shown as blue; only radiation hybrid markers and markers that are neither genetic nor radiation hybrid are shown as black; markers that map to more than one position are shown in lighter colors. Users can choose a color to highlight a subset of markers of interest from the Filter options in STS Markers Track Setting page. Methods Positions of STS markers are determined using both full sequences and primer information. Full sequences are aligned using blat, while ePCR is used to find locations using primer information. Using the Filter The track filter can be used to change the color or include/exclude a set of map data within the track. This is helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: In the pulldown menu, select the map whose data you would like to highlight or exclude in the display. By default, the "All Genetic" option is selected. Choose the color or display characteristic that will be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display data from the map selected in the pulldown list. If "include" is selected, the browser will display only data from the selected map. When you have finished configuring the filter, click the Submit button. Credits This track was designed and implemented by Terry Furey and Yontao Lu. Many thanks to Whitehead Institute (Broad Institute) and Jackson Lab for contributing the data. ensGene Ensembl Genes Ensembl Genes Genes and Gene Predictions Description These gene predictions were generated by Ensembl. For more information on the different gene tracks, see our Genes FAQ. Methods For a description of the methods used in Ensembl gene predictions, please refer to Hubbard et al. (2002), also listed in the References section below. Data access Ensembl Gene data can be explored interactively using the Table Browser or the Data Integrator. For local downloads, the genePred format files for mm7 are available in our downloads directory as ensGene.txt.gz or in our genes download directory in GTF format. For programmatic access, the data can be queried from the REST API or directly from our public MySQL servers. Instructions on this method are available on our MySQL help page and on our blog. Previous versions of this track can be found on our archive download server. Credits We would like to thank Ensembl for providing these gene annotations. For more information, please see Ensembl's genome annotation page. References Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161 refSeqComposite NCBI RefSeq RefSeq genes from NCBI Genes and Gene Predictions Description The NCBI RefSeq Genes composite track shows mouse protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks use coordinates provided by RefSeq, except for the UCSC RefSeq track, which UCSC produces by realigning the RefSeq RNAs to the genome. This realignment may result in occasional differences between the annotation coordinates provided by UCSC and NCBI. For RNA-seq analysis, we advise using NCBI aligned tables like RefSeq All or RefSeq Curated. See the Methods section for more details about how the different tracks were created. Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track is a composite track that contains differing data sets. To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to hide. Note: Not all subtracts are available on all assemblies. The possible subtracks include: RefSeq aligned annotations and UCSC alignment of RefSeq annotations RefSeq All – all curated and predicted annotations provided by RefSeq. RefSeq Curated – subset of RefSeq All that includes only those annotations whose accessions begin with NM, NR, NP or YP. (NP and YP are used only for protein-coding genes on the mitochondrion; YP is used for human only.) They were manually curated, based on publications describing transcripts and manual reviews of evidence which includes EST and full-length cDNA alignments, protein sequences, splice sites and any other evidence available in databases or the scientific literature. The resulting sequences can differ from the genome, they exist independently from a particular human genome build, and so must be aligned to the genome to create a track. The "RefSeq Curated" track is NCBI's mapping of these transcripts to the genome. Another alignment track exists for these, the "UCSC RefSeq" track (see beloow). RefSeq Predicted – subset of RefSeq All that includes those annotations whose accessions begin with XM or XR. They were predicted based on protein, cDNA, EST and RNA-seq alignments to the genome assembly by the NCBI Gnomon prediction software. RefSeq Other – all other annotations produced by the RefSeq group that do not fit the requirements for inclusion in the RefSeq Curated or the RefSeq Predicted tracks. Examples are untranscribed pseudogenes or gene clusters, such as HOX or protocadherin alpha. They were manually curated from publications or databases but are not typical transcribed genes. RefSeq Alignments – alignments of RefSeq RNAs to the mouse genome provided by the RefSeq group, following the display conventions for PSL tracks. RefSeq Diffs – alignment differences between the mouse reference genome(s) and RefSeq curated transcripts. (Track not currently available for every assembly.) UCSC RefSeq – annotations generated from UCSC's realignment of RNAs with NM and NR accessions to the mouse genome. This track was previously known as the "RefSeq Genes" track. RefSeq Select (subset, only on hg38) – Subset of RefSeq Curated, transcripts marked as part of the RefSeq Select dataset. A single Select transcript is chosen as representative for each protein-coding gene. See NCBI RefSeq Select. RefSeq HGMD (subset) – Subset of RefSeq Curated, transcripts annotated by the Human Gene Mutation Database. This track is only available on the human genomes hg19 and hg38. It is the most restricted RefSeq subset, targeting clinical diagnostics. The RefSeq All, RefSeq Curated, RefSeq Predicted, and UCSC RefSeq tracks follow the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), or reviewed (dark), as defined by RefSeq. Color Level of review Reviewed: the RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information. Provisional: the RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff. Predicted: the RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted. The item labels and codon display properties for features within this track can be configured through the check-box controls at the top of the track description page. To adjust the settings for an individual subtrack, click the wrench icon next to the track name in the subtrack list . Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name or OMIM identifier instead of the gene name, show all or a subset of these labels including the gene name, OMIM identifier and accession names, or turn off the label completely. Codon coloring: This track has an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. The RefSeq Diffs track contains five different types of inconsistency between the reference genome sequence and the RefSeq transcript sequences. The five types of differences are as follows: mismatch – aligned but mismatching bases, plus HGVS g. to show the genomic change required to match the transcript and HGVS c./n. to show the transcript change required to match the genome. short gap – genomic gaps that are too small to be introns (arbitrary cutoff of < 45 bp), most likely insertions/deletion variants or errors, with HGVS g. and c./n. showing differences. shift gap – shortGap items whose placement could be shifted left and/or right on the genome due to repetitive sequence, with HGVS c./n. position range of ambiguous region in transcript. Here, thin and thick lines are used -- the thin line shows the span of the repetitive sequence, and the thick line shows the rightmost shifted gap. double gap – genomic gaps that are long enough to be introns but that skip over transcript sequence (invisible in default setting), with HGVS c./n. deletion. skipped – sequence at the beginning or end of a transcript that is not aligned to the genome (invisible in default setting), with HGVS c./n. deletion HGVS Terminology (Human Genome Variation Society): g. = genomic sequence ; c. = coding DNA sequence ; n. = non-coding RNA reference sequence. When reporting HGVS with RefSeq sequences, to make sure that results from research articles can be mapped to the genome unambiguously, please specify the RefSeq annotation release displayed on the transcript's Genome Browser details page and also the RefSeq transcript ID with version (e.g. NM_012309.4 not NM_012309). Methods Tracks contained in the RefSeq annotation and RefSeq RNA alignment tracks were created at UCSC using data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and converted to the genePred and PSL table formats for display in the Genome Browser. Information about the NCBI annotation pipeline can be found here. The RefSeq Diffs track is generated by UCSC using NCBI's RefSeq RNA alignments. The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks. RefSeq RNAs were aligned against the mouse genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Data Access The raw data for these tracks can be accessed in multiple ways. It can be explored interactively using the REST API, Table Browser or Data Integrator. The tables can also be accessed programmatically through our public MySQL server or downloaded from our downloads server for local processing. The previous track versions are available in the archives of our downloads server. You can also access any RefSeq table entries in JSON format through our JSON API. The data in the RefSeq Other and RefSeq Diffs tracks are organized in bigBed file format; more information about accessing the information in this bigBed file can be found below. The other subtracks are associated with database tables as follows: genePred format: RefSeq All - ncbiRefSeq RefSeq Curated - ncbiRefSeqCurated RefSeq Predicted - ncbiRefSeqPredicted UCSC RefSeq - refGene PSL format: RefSeq Alignments - ncbiRefSeqPsl The first column of each of these tables is "bin". This column is designed to speed up access for display in the Genome Browser, but can be safely ignored in downstream analysis. You can read more about the bin indexing system here. The annotations in the RefSeqOther and RefSeqDiffs tracks are stored in bigBed files, which can be obtained from our downloads server here, ncbiRefSeqOther.bb and ncbiRefSeqDiffs.bb. Individual regions or the whole set of genome-wide annotations can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system from the utilities directory linked below. For example, to extract only annotations in a given region, you could use the following command: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/mm7/ncbiRefSeq/ncbiRefSeqOther.bb -chrom=chr16 -start=34990190 -end=36727467 stdout You can download a GTF format version of the RefSeq All table from the GTF downloads directory. The genePred format tracks can also be converted to GTF format using the genePredToGtf utility, available from the utilities directory on the UCSC downloads server. The utility can be run from the command line like so: genePredToGtf mm7 ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Access section. A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, and RefSeq Predicted tracks can be found on our downloads server here. Please refer to our mailing list archives for questions. Previous versions of the ncbiRefSeq set of tracks can be found on our archive download server. Credits This track was produced at UCSC from data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 refGene UCSC RefSeq UCSC annotations of RefSeq RNAs (NM_* and NR_*) Genes and Gene Predictions Description The RefSeq Genes track shows known mouse protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods RefSeq RNAs were aligned against the mouse genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 intronEst Spliced ESTs Mouse ESTs That Have Been Spliced mRNA and EST Description This track shows alignments between mouse expressed sequence tags (ESTs) in GenBank and the genome that show signs of splicing when aligned against the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. To be considered spliced, an EST must show evidence of at least one canonical intron (i.e., the genomic sequence between EST alignment blocks must be at least 32 bases in length and have GT/AG ends). By requiring splicing, the level of contamination in the EST databases is drastically reduced at the expense of eliminating many genuine 3' ESTs. For a display of all ESTs (including unspliced), see the mouse EST track. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, darker shading indicates a larger number of aligned ESTs. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, mouse ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence are displayed in this track. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 cpgIslandExtUnmasked Unmasked CpG CpG Islands on All Sequence (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 xenoRefGene Other RefSeq Non-Mouse RefSeq Genes Genes and Gene Predictions Description This track shows known protein-coding and non-protein-coding genes for organisms other than mouse, taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. Click here for more information about this feature. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods The RNAs were aligned against the mouse genome using blat; those with an alignment of less than 15% were discarded. At least 40 bases must be aligned to DNA that is not repeat masked. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 1.0% of the best and at least 35% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 affyGnf1m Affy GNF1M Alignments of Probes from Affymetrix GNF1M Chip Expression and Regulation Description This track shows the location of the sequences used for the selection of probes on the Affymetrix GNF1M chips. The annotation contains 31,000 non-overlapping mouse genes and gene predictions. Methods The sequences were mapped to the genome with blat followed by pslReps using the parameters -minCover=0.3, -minAli=0.95 and -nearTop=0.005. Credits Thanks to the Genomics Institute of the Novartis Research Foundation (GNF) for the data underlying this track. affyMOE430 Affy MOE430 Alignments of Affymetrix Consensus Sequences from Mouse MOE430 (A and B) Expression and Regulation Description This track shows the location of the consensus sequences used for the selection of probes on the Affymetrix Mouse MOE430 set (A and B) of chips. Methods Consensus sequences were downloaded from the Affymetrix Product Support and mapped to the genome with blat followed by pslReps using the parameters -minCover=0.3, -minAli=0.95 and -nearTop=0.005. Credits Thanks to Affymetrix for the data underlying this track. affyU74 Affy U74 Alignments of Affymetrix Consensus Sequences from MG-U74 v2 (A,B, and C) Expression and Regulation Description This track shows the location of the consensus sequences used for the selection of probes on the Affymetrix MG-U74v2 set (A,B and C) of chips. Methods Consensus sequences were downloaded from the Affymetrix Product Support and mapped to the genome with blat followed by pslReps using the parameters -minCover=0.3, -minAli=0.95 and -nearTop=0.005. Credits Thanks to Affymetrix for the data underlying this track. allenBrainAli Allen Brain Allen Brain Atlas Probes Expression and Regulation Description This track provides a link into the Allen Brain Atlas (ABA) images for this probe. The ABA is an extensive database of high resolution in-situ hybridization images of adult male mouse brains covering the majority of genes. Methods The ABA created a platform for high-throughput in situ hybridization (ISH) that allows a highly systematic approach to analyzing gene expression in the brain. ISH is a technique that allows the cellular localization of mRNA transcripts for specific genes. Labeled antisense probes, specific to a particular gene, are hybridized to cellular (sense) transcripts and subsequent detection of the bound probe produces specific labeling in those cells expressing the particular gene. This method involves tagged nucleotides detected by colorimetric methods. The platform used for the ABA utilizes this non-isotopic approach, with digoxigenin-labeled nucleotides incorporated into a riboprobe produced by in vitro transcription. This method produces a label that fills the cell body, in contrast to autoradiography that produces scattered silver grains surrounding each labeled cell. To enhance the ability to detect low level expression, the ABA has incorporated a tyramide signal amplification step into the protocol that greatly increases sensitivity. The specific methodology is described in detail within the ABA Data Production Processes document. Credits Thanks to the Allen Institute for Brain Science in general, and Susan Sunkin in particular, for coordinating with UCSC on this annotation. gold Assembly Assembly from Fragments Mapping and Sequencing Description This track shows the draft assembly of the mouse genome. Whole-genome shotgun reads were assembled into contigs and when possible, contigs were grouped into scaffolds (also known as "supercontigs"). The order, orientation and gap sizes between contigs within a scaffold are based on paired-end read evidence. In dense mode, this track depicts the contigs that make up the currently-viewed scaffold. Contig boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist between contigs, spaces are shown between the gold and brown blocks. The relative order and orientation of the contigs within a scaffold is always known; therefore, a line is drawn in the graphical display to bridge the blocks. All components within this track are of fragment type "W" (Whole Genome Shotgun contig). augustusGene AUGUSTUS AUGUSTUS ab initio gene predictions v3.1 Genes and Gene Predictions Description This track shows ab initio predictions from the program AUGUSTUS (version 3.1). The predictions are based on the genome sequence alone. For more information on the different gene tracks, see our Genes FAQ. Methods Statistical signal models were built for splice sites, branch-point patterns, translation start sites, and the poly-A signal. Furthermore, models were built for the sequence content of protein-coding and non-coding regions as well as for the length distributions of different exon and intron types. Detailed descriptions of most of these different models can be found in Mario Stanke's dissertation. This track shows the most likely gene structure according to a Semi-Markov Conditional Random Field model. Alternative splicing transcripts were obtained with a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2). The different models used by Augustus were trained on a number of different species-specific gene sets, which included 1000-2000 training gene structures. The --species option allows one to choose the species used for training the models. Different training species were used for the --species option when generating these predictions for different groups of assemblies. Assembly Group Training Species Fish zebrafish Birds chicken Human and all other vertebrates human Nematodes caenorhabditis Drosophila fly A. mellifera honeybee1 A. gambiae culex S. cerevisiae saccharomyces This table describes which training species was used for a particular group of assemblies. When available, the closest related training species was used. Credits Thanks to the Stanke lab for providing the AUGUSTUS program. The training for the chicken version was done by Stefanie König and the training for the human and zebrafish versions was done by Mario Stanke. References Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. PMID: 14534192 bacEndPairs BAC End Pairs BAC End Pairs Mapping and Sequencing Description Bacterial artificial chromosomes (BACs) are a key part of many large scale sequencing projects. A BAC typically consists of 25 - 350 kb of DNA. During the early phase of a sequencing project, it is common to sequence a single read (approximately 500 bases) off each end of a large number of BACs. Later on in the project, these BAC end reads can be mapped to the genome sequence. This track shows these mappings in cases where both ends could be mapped. These BAC end pairs can be useful for validating the assembly over relatively long ranges. In some cases, the BACs are useful biological reagents. This track can also be used for determining which BAC contains a given gene, useful information for certain wet lab experiments. A valid pair of BAC end sequences must be at least 25 kb but no more than 350 kb away from each other. The orientation of the first BAC end sequence must be "+" and the orientation of the second BAC end sequence must be "-". The scoring scheme used for this annotation assigns 1000 to an alignment when the BAC end pair aligns to only one location in the genome (after filtering). When a BAC end pair or clone aligns to multiple locations, the score is calculated as 1500/(number of alignments). Methods BAC end sequences are placed on the assembled sequence using Jim Kent's blat program. Credits Additional information about the clone, including how it can be obtained, may be found at the NCBI Clone Registry. To view the registry entry for a specific clone, open the details page for the clone and click on its name at the top of the page. cytoBand Chromosome Band Chromosome Bands Based On Microscopy Mapping and Sequencing Description The chromosome band track represents the approximate location of bands seen on Giemsa-stained chromosomes. Methods Data are derived from the ideogram.gz file downloaded from the NCBI ftp site ftp://ftp.ncbi.nlm.nih.gov/pub/gdp/ (NCBI current version only). Band lengths are typically estimated based on FISH or other molecular markers interpreted via microscopy. Credits We would like to thank NCBI for providing this information. Please direct any inquires into the exact method used for each organism to NCBI. cytoBandIdeo Chromosome Band (Ideogram) Chromosome Bands Based on Microscopy (for Ideogram) Mapping and Sequencing gap Gap Gap Locations Mapping and Sequencing Description This track depicts gaps in the assembly. These gaps - with the exception of intractable centromere gaps - will be closed during the finishing process. Gaps are represented as black boxes in this track. If the relative order and orientation of the contigs on either side of the gap is known, it is a bridged gap and a white line is drawn through the black box representing the gap. This assembly contains the following principal types of gaps: Fragment - gaps between the contigs of a draft clone. (In this context, a contig is a set of overlapping sequence reads.) Clone - gaps between clones in the same map contig. Contig - gaps between map contigs. Centromere - gaps from centromeres (3,000,000 Ns) or other large blocks of heterochromatin (size varies). gc5Base GC Percent GC Percent in 5-Base Windows Mapping and Sequencing Description The GC percent track shows the percentage of G (guanine) and C (cytosine) bases in 5-base windows. High GC content is typically associated with gene-rich areas. This track may be configured in a variety of ways to highlight different apsects of the displayed information. Click the "Graph configuration help" link for an explanation of the configuration options. Credits The data and presentation of this graph were prepared by Hiram Clawson. igtc Gene Trap International Gene Trap Consortium Sequence Tag Alignments Genes and Gene Predictions Description This track shows alignments of International Gene Trap Consortium sequence tags to the mouse genome. Items are labeled by cell line and colored by source: BG: BayGenomics (USA) CMHD: Centre for Modeling Human Disease (Toronto, Canada) EGTC: Exchangeable Gene Trap Clones (Kumamoto University, Japan) ESDB: Embryonic Stem Cell Database (University of Manitoba, Canada) FHCRC: Soriano Lab Gene Trap Database (originally at Fred Hutchinson Cancer Research Center, Seattle, USA; now at Mount Sinai School of Medicine, Manhattan, NY) GGTC: German Gene Trap Consortium (Germany) SIGTR: Sanger Institute Gene Trap Resource (Cambridge, UK) TIGEM: TIGEM-IRBM Gene Trap (Naples, Italy) TIGM: Texas Institute for Genomic Medicine (Houston, Texas) Methods The IGTC pipeline uses BLAT to align sequence tags from dbGSS to the mouse genome and BLAST to match sequence tags to genes. The pipeline filters and reconciles the two sets of alignments to associate cell lines with trapped genes. Credits Thanks to the International Gene Trap Consortium for providing this track. geneid Geneid Genes Geneid Gene Predictions Genes and Gene Predictions Description This track shows gene predictions from the geneid program developed by Roderic Guigó's Computational Biology of RNA Processing group which is part of the Centre de Regulació Genòmica (CRG) in Barcelona, Catalunya, Spain. Methods Geneid is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, start and stop codons are predicted and scored along the sequence using Position Weight Arrays (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. Credits Thanks to Computational Biology of RNA Processing for providing these data. References Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr Protoc Bioinformatics. 2007 Jun;Chapter 4:Unit 4.3. PMID: 18428791 Parra G, Blanco E, Guigó R. GeneID in Drosophila. Genome Res. 2000 Apr;10(4):511-5. PMID: 10779490; PMC: PMC310871 genscan Genscan Genes Genscan Gene Predictions Genes and Gene Predictions Description This track shows predictions from the Genscan program written by Chris Burge. The predictions are based on transcriptional, translational and donor/acceptor splicing signals as well as the length and compositional distributions of exons, introns and intergenic regions. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The track description page offers the following filter and configuration options: Color track by codons: Select the genomic codons option to color and label each codon in a zoomed-in display to facilitate validation and comparison of gene predictions. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Methods For a description of the Genscan program and the model that underlies it, refer to Burge and Karlin (1997) in the References section below. The splice site models used are described in more detail in Burge (1998) below. Credits Thanks to Chris Burge for providing the Genscan program. References Burge C. Modeling Dependencies in Pre-mRNA Splicing Signals. In: Salzberg S, Searls D, Kasif S, editors. Computational Methods in Molecular Biology. Amsterdam: Elsevier Science; 1998. p. 127-163. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997 Apr 25;268(1):78-94. PMID: 9149143 gnfAtlas2 GNF Atlas 2 GNF Expression Atlas 2 Expression and Regulation Description This track shows expression data from the GNF Gene Expression Atlas 2. This contains two replicates each of 61 mouse tissues run over Affymetrix microarrays. By default, averages of related tissues are shown. Display all tissues by selecting "All Arrays" from the "Combine arrays" menu on the track settings page. As is standard with microarray data red indicates overexpression in the tissue, and green indicates underexpression. You may want to view gene expression with the Gene Sorter as well as the Genome Browser. Credits Thanks to the Genomics Institute of the Novartis Research Foundation (GNF) for the data underlying this track. References Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7. PMID: 15075390; PMC: PMC395923 affyGnfU74A GNF U74A GNF Expression Atlas on Mouse Affymetrix U74A Chip Expression and Regulation Description This track shows expression data from GNF (The Genomics Institute of the Novartis Research Foundation) using the Affymetrix U74A chip. Methods For detailed information about the experiments, see Su et al. (2002) in the References section below. Alignments displayed on the track correspond to the consensus sequences used by Affymetrix to choose probes. In dense mode, the track color denotes the average signal over all experiments on a log base 2 scale. Lighter colors correspond to lower signals; darker colors correspond to higher signals. In full mode, the color of each item represents the log base 2 ratio of the signal of that particular experiment to the median signal of all experiments for that probe. More information about individual probes and probe sets is available at Affymetrix's NetAffx website. Credits Thanks to GNF for providing these data. References Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A et al. Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci U S A. 2002 Apr 2;99(7):4465-70. PMID: 11904358; PMC: PMC123671 affyGnfU74B GNF U74B GNF Expression Atlas on Mouse Affymetrix U74B Chip Expression and Regulation Description This track shows expression data from GNF (The Genomics Institute of the Novartis Research Foundation) using the Affymetrix U74B chip. Methods For detailed information about the experiments, see Su et al. (2002) in the References section below. Alignments displayed on the track correspond to the consensus sequences used by Affymetrix to choose probes. In dense mode, the track color denotes the average signal over all experiments on a log base 2 scale. Lighter colors correspond to lower signals; darker colors correspond to higher signals. In full mode, the color of each item represents the log base 2 ratio of the signal of that particular experiment to the median signal of all experiments for that probe. More information about individual probes and probe sets is available at Affymetrix's NetAffx website. Credits Thanks to GNF for providing these data. References Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A et al. Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci U S A. 2002 Apr 2;99(7):4465-70. PMID: 11904358; PMC: PMC123671 affyGnfU74C GNF U74C GNF Expression Atlas on Mouse Affymetrix U74C Chip Expression and Regulation Description This track shows expression data from GNF (The Genomics Institute of the Novartis Research Foundation) using the Affymetrix U74C chip. Methods For detailed information about the experiments, see Su et al. (2002) in the References section below. Alignments displayed on the track correspond to the consensus sequences used by Affymetrix to choose probes. In dense mode, the track color denotes the average signal over all experiments on a log base 2 scale. Lighter colors correspond to lower signals and darker colors correspond to higher signals. In full mode, the color of each item represents the log base 2 ratio of the signal of that particular experiment to the median signal of all experiments for that probe. More information about individual probes and probe sets is available at Affymetrix's NetAffx website. Credits Thanks to GNF for providing these data. References Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A et al. Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci U S A. 2002 Apr 2;99(7):4465-70. PMID: 11904358; PMC: PMC123671 ctgPos Map Contigs Physical Map Contigs Mapping and Sequencing Description This track shows the locations of mouse contigs on the physical map. The underlying data are derived from NCBI information specific to this assembly. Although the NCBI data indicates the orientations of the contigs based on how they were assembled into the final sequence, all contigs in this track are oriented to the "+" strand. mgcFullMrna MGC Genes Mammalian Gene Collection Full ORF mRNAs Genes and Gene Predictions Description This track shows alignments of mouse mRNAs from the Mammalian Gene Collection (MGC) having full-length open reading frames (ORFs) to the genome. The goal of the Mammalian Gene Collection is to provide researchers with unrestricted access to sequence-validated full-length protein-coding cDNA clones for human, mouse, and rat genes. Display Conventions and Configuration The track follows the display conventions for gene prediction tracks. An optional codon coloring feature is available for quick validation and comparison of gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Methods GenBank mouse MGC mRNAs identified as having full-length ORFs were aligned against the genome using blat. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 1% of the best and at least 95% base identity with the genomic sequence were kept. Credits The mouse MGC full-length mRNA track was produced at UCSC from mRNA sequence data submitted to GenBank by the Mammalian Gene Collection project. References Mammalian Gene Collection project references. Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 microsat Microsatellite Microsatellites - Di-nucleotide and Tri-nucleotide Repeats Variation and Repeats Description This track displays regions that are likely to be useful as microsatellite markers. These are sequences of at least 15 perfect di-nucleotide and tri-nucleotide repeats and tend to be highly polymorphic in the population. Methods The data shown in this track are a subset of the Simple Repeats track, selecting only those repeats of period 2 and 3, with 100% identity and no indels and with at least 15 copies of the repeat. The Simple Repeats track is created using the Tandem Repeats Finder. For more information about this program, see Benson (1999). Credits Tandem Repeats Finder was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 miRNA miRNA MicroRNAs from miRBase Genes and Gene Predictions Description The miRNA track shows microRNAs from miRBase. Display Conventions and Configuration Mature miRNAs (miRs) are represented by thick blocks. The predicted stem-loop portions of the primary transcripts are indicated by thinner blocks. miRNAs in the sense orientation are shown in black; those in the reverse orientation are colored grey. When a single precursor produces two mature miRs from its 5' and 3' parts, it is displayed twice with the two different positions of the mature miR. To display only those items that exceed a specific unnormalized score, enter a minimum score between 0 and 1000 in the text box at the top of the track description page. Methods Mature and precursor miRNAs from the miRNA Registry were aligned against the genome using blat. The extents of the precursor sequences were not generally known, and were predicted based on base-paired hairpin structure. miRBase is described in Griffiths-Jones, S. et al. (2006). The miRNA Registry is described in Griffiths-Jones, S. (2004) and Weber, M.J. (2005) in the References section below. Credits This track was created by Michel Weber of Laboratoire de Biologie Moléculaire Eucaryote, CNRS Université Paul Sabatier (Toulouse, France), Yves Quentin of Laboratoire de Microbiologie et Génétique Moléculaires (Toulouse, France) and Sam Griffiths-Jones of The Wellcome Trust Sanger Institute (Cambridge, UK). References When making use of these data, please cite: Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D140-4. PMID: 16381832; PMC: PMC1347474 Griffiths-Jones S. The microRNA Registry. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D109-11. PMID: 14681370; PMC: PMC308757 Weber MJ. New human and mouse microRNA genes found by homology search. FEBS J. 2005 Jan;272(1):59-73. PMID: 15634332 The following publication provides guidelines on miRNA annotation: Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X, Dreyfuss G, Eddy SR, Griffiths-Jones S, Marshall M et al. A uniform system for microRNA annotation. RNA. 2003 Mar;9(3):277-9. PMID: 12592000; PMC: PMC1370393 For more information on blat, see Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 est Mouse ESTs Mouse ESTs Including Unspliced mRNA and EST Description This track shows alignments between mouse expressed sequence tags (ESTs) in GenBank and the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, mouse ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 xenoMrna Other mRNAs Non-Mouse mRNAs from GenBank mRNA and EST Description This track displays translated blat alignments of vertebrate and invertebrate mRNA in GenBank from organisms other than mouse. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) for this track is in two parts. The first + indicates the orientation of the query sequence whose translated protein produced the match (here always 5' to 3', hence +). The second + or - indicates the orientation of the matching translated genomic sequence. Because the two orientations of a DNA sequence give different predicted protein sequences, there are four combinations. ++ is not the same as --, nor is +- the same as -+. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods The mRNAs were aligned against the mouse genome using translated blat. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only those alignments having a base identity level within 1% of the best and at least 25% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 regPotential7X Reg Potential 7 species ESPERR Regulatory Potential (7 species) Expression and Regulation Description This track displays regulatory potential (RP) scores computed from alignments of mouse, rat (rn4), human (hg18), chimpanzee (panTro2), macaque (rheMac2), dog (canFam2), and cow (bosTau2). RP scores compare frequencies of short alignment patterns between known regulatory elements and neutral DNA. The sensitivity and specificity of RP scores were calibrated on the hemoglobin beta gene cluster. These results suggest a threshold of ~0.00 for the identification of new putative regulatory elements. The default viewing range for this track is from 0.0 to 0.1. Score values below the 0.0 default lower limit indicate resemblance to alignment patterns typical of neutral DNA, while score values above the 0.1 default upper limit indicate very marked resemblance to alignment patterns typical of regulatory elements in the training set. The range of RP scores from 0.0 to 0.1 contains the prediction threshold suggested by calibration studies, and provides an effective visualization of the score for most genomic loci. However, the user can specify different viewing ranges if desired. Note: Absence of a score value at a given location indicates lack of sufficient alignment -- scores are computed for all regions of the reference genome in which no region of more than 100 bases lacks alignment in at least three non-human species. This track may be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Methods The comparison employs log-ratios of transitions probabilities from two variable order Markov models. Training the score entails selecting appropriate alphabet (alignment column symbols) and maximal order (length of the longest patterns = order + 1) for the Markov models, and estimating their transition probabilities, based on alignment data from known regulatory elements and ancestral repeats. The scores in this track are computed using a maximal order of 2. In the track, score values are displayed using a system of overlapping windows of size 100 bp along sufficiently alignable portions of the human sequence. Log-ratios are added over positions in a window, and the sum is normalized for length. Credits Work on RP scores is performed by members of the Comparative Genomics and Bioinformatics Center at Penn State University. More information on this research and the collection of known regulatory elements used in training the score can be found at this site. References King DC, Taylor J, Elnitski L, Chiaromonte F, Miller W, Hardison RC. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 2005 Aug;15(8):1051-60. PMID: 16024817; PMC: PMC1182217 Kolbe D, Taylor J, Elnitski L, Eswara P, Li J, Miller W, Hardison R, Chiaromonte F. Regulatory potential scores from genome-wide three-way alignments of human, mouse, and rat. Genome Res. 2004 Apr;14(4):700-7. PMID: 15060013; PMC: PMC383316 rikenCageCtss Riken CAGE Riken CAGE - Predicted Gene Start Sites mRNA and EST Description This track shows the number of 5' cap analysis gene expression (CAGE) tags that map to the genome on the "plus" and "minus" strands at a specific location. For clarity, only the first 5' nucleotide in the tag (relative to the transcript direction) is considered. Areas in which many tags map to the same region may indicate a significant transcription start site. The number of tags should be proportional to the expression rate in the originating tissues. Display Conventions and Configuration The position of the first 5' nucleotide in the tag is represented by a solid block. The height of the block indicates the number of 5' CAGE tag starts that map at that location. This composite annotation track contains two subtracks that may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. Methods To create the tag, a linker was attached to the 5' end of full-length cDNAs which had been selected by cap trapping. The first 20 bp of the cDNA were cleaved using the class II restriction enzyme, MmeI, followed by PCR amplification. Concatamers of the resulting 32 bp tags were then formed for more efficient sequencing. A total of 7,151,511 mapped CAGE tags from 145 cDNA libraries, corresponding to 22 distinct tissues were produced. The tags were mapped to the mm5 assembly and lifted to other assemblies using UCSC's liftOver tool. For more information on CAGE, see Shiraki et al. (2003) and Carninci et al (2005) below. The mapping methodology employed in this annotation will be described in upcoming publications. Credits These data were contributed by the Functional Annotation of Mouse (FANTOM) Consortium, RIKEN Genome Science Laboratory and RIKEN Genome Exploration Research Group (Genome Network Project Core Group). FANTOM Consortium: P. Carninci, T. Kasukawa, S. Katayama, Gough, M. Frith, N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, C. Wells, R. Kodzius, K. Shimokawa, V. B. Bajic, S. E. Brenner, S. Batalov, A. R. R. Forrest, M. Zavolan, M. J. Davis, L. G. Wilming, V. Aidinis, J. Allen, A. Ambesi-Impiombato, R. Apweiler, R. N. Aturaliya, T. L. Bailey, M. Bansal, K. W. Beisel, T. Bersano, H. Bono, A. M. Chalk, K. P. Chiu, V. Choudhary, A. Christoffels, D. R. Clutterbuck, M. L. Crowe, E. Dalla, B. P. Dalrymple, B. de Bono, G. Della Gatta, D. di Bernardo, T. Down, P. Engstrom, M. Fagiolini, G. Faulkner, C. F. Fletcher, T. Fukushima, M. Furuno, S. Futaki, M. Gariboldi, P. Georgii-Hemming, T. R. Gingeras, T. Gojobori, R. E. Green, S. Gustincich, M. Harbers, V. Harokopos, Y. Hayashi, S. Henning, T. K. Hensch, N. Hirokawa, D. Hill, L. Huminiecki, M. Iacono, K. Ikeo, A. Iwama, T. Ishikawa, M. Jakt, A. Kanapin, M. Katoh, Y. Kawasawa, J. Kelso, H. Kitamura, H. Kitano, G. Kollias, S. P. T. Krishnan, A.F. Kruger, K. Kummerfeld, I. V. Kurochkin, L. F. Lareau, L. Lipovich, J. Liu, S. Liuni, S. McWilliam, M. Madan Babu, M. Madera, L. Marchionni, H. Matsuda, S. Matsuzawa, H. Miki, F. Mignone, S. Miyake, K. Morris, S. Mottagui-Tabar, N. Mulder, N. Nakano, H. Nakauchi, P. Ng, R. Nilsson, S. Nishiguchi, S. Nishikawa, F. Nori, O. Ohara, Y. Okazaki, V. Orlando, K. C. Pang, W. J. Pavan, G. Pavesi, G. Pesole, N. Petrovsky, S. Piazza, W. Qu, J. Reed, J. F. Reid, B. Z. Ring, M. Ringwald, B. Rost, Y. Ruan, S. Salzberg, A. Sandelin, C. Schneider, C. Schoenbach, K. Sekiguchi, C. A. M. Semple, S. Seno, L. Sessa, Y. Sheng, Y. Shibata, H. Shimada, K. Shimada, B. Sinclair, S. Sperling, E. Stupka, K. Sugiura, R. Sultana, Y. Takenaka, K. Taki, K. Tammoja, S. L. Tan, S. Tang, M. S. Taylor, J. Tegner, S. A. Teichmann, H. R. Ueda, E. van Nimwegene, R. Verardo, C. L. Wei, K. Yagi, H. Yamanishi, E. Zabarovsky, S. Zhu, A. Zimmer, W. Hide, C. Bult, S. M. Grimmond, R. D. Teasdale, E. T. Liu, V. Brusic, J. Quackenbush, C. Wahlestedt, J. Mattick, D. Hume. RIKEN Genome Exploration Research Group: C. Kai, D. Sasaki, Y. Tomaru, S. Fukuda, M. Kanamori-Katayama, M. Suzuki, J. Aoki, T. Arakawa, J. Iida, K. Imamura, M. Itoh, T. Kato, H. Kawaji, N. Kawagashira, T. Kawashima, M. Kojima, S. Kondo, H. Konno, K. Nakano, N. Ninomiya, T. Nishio, M. Okada, C. Plessy, K. Shibata, T. Shiraki, S. Suzuki, M. Tagami, K Waki, A. Watahiki, Y. Okamura-Oho, H. Suzuki, J. Kawai. General Organizer: Y. Hayashizaki References Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C et al. The transcriptional landscape of the mammalian genome. Science. 2005 Sep 2;309(5740):1559-63. PMID: 16141072 Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius R, Watahiki A, Nakamura M, Arakawa T et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15776-81. PMID: 14663149; PMC: PMC307644 rikenCageCtssMinus Riken CAGE - Riken CAGE Minus Strand - Predicted Gene Start Sites mRNA and EST rikenCageCtssPlus Riken CAGE + Riken CAGE Plus Strand - Predicted Gene Start Sites mRNA and EST rikenCageTc Riken CAGE TC Riken CAGE - Associated Transcript Clusters mRNA and EST Description This track shows transcription start points as defined by the regions of CAGE (5' Cap Analysis Gene Expression) tag clusters. CAGE tags are 19-20-mers sequenced from 5' ends of full-length cDNAs produced using RIKEN full-length cDNA technology. A CAGE cluster is defined as one or more overlapping CAGE tags on the same strand, regardless of tissue of origin. The full annotation of a cluster, including tissue(s) of origin, can be obtained from the CAGE Analysis Viewer, via a link on the details page for that cluster. Methods The CAGE tags are sequenced from the 5' ends of full-length cDNAs produced using RIKEN full-length cDNA technology. To create the tag, a linker was attached to the 5' end of full-length cDNAs which had been selected by cap trapping. The first 20 bp of the cDNA were cleaved using the class II restriction enzyme, MmeI, followed by PCR amplification. Concatamers of the resulting 32 bp tags were then formed for more efficient sequencing. For more information on CAGE, see Shiraki et al. (2003) and Carninci et al (2005). RIKEN website for information about RIKEN full-length cDNA technologies. The mapping methodology employed in this annotation will be described in upcoming publications. Credits These data were contributed by the Functional Annotation of Mouse (FANTOM) Consortium, RIKEN Genome Science Laboratory and RIKEN Genome Exploration Research Group (Genome Network Project Core Group). FANTOM Consortium: P. Carninci, T. Kasukawa, S. Katayama, Gough, M. Frith, N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, C. Wells, R. Kodzius, K. Shimokawa, V. B. Bajic, S. E. Brenner, S. Batalov, A. R. R. Forrest, M. Zavolan, M. J. Davis, L. G. Wilming, V. Aidinis, J. Allen, A. Ambesi-Impiombato, R. Apweiler, R. N. Aturaliya, T. L. Bailey, M. Bansal, K. W. Beisel, T. Bersano, H. Bono, A. M. Chalk, K. P. Chiu, V. Choudhary, A. Christoffels, D. R. Clutterbuck, M. L. Crowe, E. Dalla, B. P. Dalrymple, B. de Bono, G. Della Gatta, D. di Bernardo, T. Down, P. Engstrom, M. Fagiolini, G. Faulkner, C. F. Fletcher, T. Fukushima, M. Furuno, S. Futaki, M. Gariboldi, P. Georgii-Hemming, T. R. Gingeras, T. Gojobori, R. E. Green, S. Gustincich, M. Harbers, V. Harokopos, Y. Hayashi, S. Henning, T. K. Hensch, N. Hirokawa, D. Hill, L. Huminiecki, M. Iacono, K. Ikeo, A. Iwama, T. Ishikawa, M. Jakt, A. Kanapin, M. Katoh, Y. Kawasawa, J. Kelso, H. Kitamura, H. Kitano, G. Kollias, S. P. T. Krishnan, A.F. Kruger, K. Kummerfeld, I. V. Kurochkin, L. F. Lareau, L. Lipovich, J. Liu, S. Liuni, S. McWilliam, M. Madan Babu, M. Madera, L. Marchionni, H. Matsuda, S. Matsuzawa, H. Miki, F. Mignone, S. Miyake, K. Morris, S. Mottagui-Tabar, N. Mulder, N. Nakano, H. Nakauchi, P. Ng, R. Nilsson, S. Nishiguchi, S. Nishikawa, F. Nori, O. Ohara, Y. Okazaki, V. Orlando, K. C. Pang, W. J. Pavan, G. Pavesi, G. Pesole, N. Petrovsky, S. Piazza, W. Qu, J. Reed, J. F. Reid, B. Z. Ring, M. Ringwald, B. Rost, Y. Ruan, S. Salzberg, A. Sandelin, C. Schneider, C. Schoenbach, K. Sekiguchi, C. A. M. Semple, S. Seno, L. Sessa, Y. Sheng, Y. Shibata, H. Shimada, K. Shimada, B. Sinclair, S. Sperling, E. Stupka, K. Sugiura, R. Sultana, Y. Takenaka, K. Taki, K. Tammoja, S. L. Tan, S. Tang, M. S. Taylor, J. Tegner, S. A. Teichmann, H. R. Ueda, E. van Nimwegene, R. Verardo, C. L. Wei, K. Yagi, H. Yamanishi, E. Zabarovsky, S. Zhu, A. Zimmer, W. Hide, C. Bult, S. M. Grimmond, R. D. Teasdale, E. T. Liu, V. Brusic, J. Quackenbush, C. Wahlestedt, J. Mattick, D. Hume. RIKEN Genome Exploration Research Group: C. Kai, D. Sasaki, Y. Tomaru, S. Fukuda, M. Kanamori-Katayama, M. Suzuki, J. Aoki, T. Arakawa, J. Iida, K. Imamura, M. Itoh, T. Kato, H. Kawaji, N. Kawagashira, T. Kawashima, M. Kojima, S. Kondo, H. Konno, K. Nakano, N. Ninomiya, T. Nishio, M. Okada, C. Plessy, K. Shibata, T. Shiraki, S. Suzuki, M. Tagami, K Waki, A. Watahiki, Y. Okamura-Oho, H. Suzuki, J. Kawai. General Organizer: Y. Hayashizaki References Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C et al. The transcriptional landscape of the mammalian genome. Science. 2005 Sep 2;309(5740):1559-63. PMID: 16141072 Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius R, Watahiki A, Nakamura M, Arakawa T et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15776-81. PMID: 14663149; PMC: PMC307644 rinnSex Rinn Sex Exp Rinn et al. Sex Gene Expression Data on MOE430A Chip Expression and Regulation Description This track shows gene expression differences between adult male and female tissues, as described in Rinn et al., 2004. Display Conventions and Configuration In full display mode, the medians of all replicates (technical and biological) for each sex's tissue are shown. To view the individual replicates, use the UCSC Gene Sorter. In packed and squished display modes, the average over all tissues is shown for each sex. Dense display mode shows the placement of the Affy MOE430A target sequences colored by overall expression level in both sexes, with darker colors representing higher levels of expression. Methods Five adult mouse tissues (liver, kidney, hypothalamus, ovary and testis) were studied. For each somatic tissue, triple selected poly-A mRNA was prepared from six independent pools (biological replicates), three male and three female. Likewise, three pools were prepared from the ovary and three pools from the testis. Each pool of RNA was derived from ten individuals. For each biological replicate, two cDNAs (technical replicates) were prepared and independently hybridized to Affymetrix MOE430A chips. Credits Thanks to John Rinn for providing these data. References Rinn JL, Rozowsky JS, Laurenzi IJ, Petersen PH, Zou K, Zhong W, Gerstein M, Snyder M. Major molecular differences between mammalian sexes are involved in drug metabolism and renal function. Dev Cell. 2004 Jun;6(6):791-800. PMID: 15177028 sgpGene SGP Genes SGP Gene Predictions Using Mouse/Human Homology Genes and Gene Predictions Description This track shows gene predictions from the SGP2 homology-based gene prediction program developed by Roderic Guigó's "Computational Biology of RNA Processing" group, which is part of the Centre de Regulació Genòmica (CRG) in Barcelona, Catalunya, Spain. To predict genes in a genomic query, SGP2 combines geneid predictions with tblastx comparisons of the genome of the target species against genomic sequences of other species (reference genomes) deemed to be at an appropriate evolutionary distance from the target. Credits Thanks to the "Computational Biology of RNA Processing" group for providing these data. simpleRepeat Simple Repeats Simple Tandem Repeats by TRF Variation and Repeats Description This track displays simple tandem repeats (possibly imperfect repeats) located by Tandem Repeats Finder (TRF) which is specialized for this purpose. These repeats can occur within coding regions of genes and may be quite polymorphic. Repeat expansions are sometimes associated with specific diseases. Methods For more information about the TRF program, see Benson (1999). Credits TRF was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 picTar PicTar miRNA MicroRNA target sites in 3' UTRs as predicted by PicTar Expression and Regulation Description This track shows microRNA target sites in 3' UTRs as predicted by PicTar, based on the RefSeq annotation of 3' UTRs. Methods The original PicTar algorithm was published in Krek et al., 2005. The annotations displayed in this track are updated predictions as published in Lall et al., 2006. PicTar is a hidden Markov model that assigns probabilities to 3' UTR subsequences as a binding site for a microRNA, considers all possible ways the 3' UTR could be bound by microRNAs, and then uses a maximum likelihood method to compute the optimal likelihood under which the 3' UTR could be explained by microRNAs and background. The score is this likelihood divided by background, i.e., the local base composition of each 3' UTR is taken into account. To fit the track conventions of the UCSC browser (integers), all scores were scaled by the maximum score of all microRNA 3'-UTR scores observed. Note that the PicTar algorithm scores any 3' UTR that has at least one aligned conserved predicted binding site for a microRNA, but then incorporates all possible binding sites into the score, even if they appear to be non-conserved. Because the score for a 3' UTR is a "phylo" average over all orthologous 3' UTRs used, "scattered" sites that appear in many species may boost the score, and individual sites shown in the display may not be aligned and conserved in all species under consideration. Two levels of conservation can be chosen: -- conservation among seven vertebrates: mouse, rat, rabbit, human, chimp, macaque, and dog -- conservation among thirteen vertebrates: mouse, rat, rabbit, human, chimp, macaque, dog, cow, armadillo, elephant, tenrec, opossum, and chicken The latter settings have improved quality, but lower sensitivity. To produce the phylogenetic scoring as describe in Krek et al., 2005, the following phylogenetic groupings were used: (i)mouse/rat/rabbit (ii)human/chimp/macaque (iii)dog/cow (iv)armadillo (v)elephant/tenrec (vi)opossum (vii)chicken (viii)xenopus (ix)tetraodon/fugu/zebrafish mammalian score = (1/3 *((i) + (ii) + (iii)) + (iv) + (v) + (vi)) final score = 1/2*(1/2*(1/2*(mammalian score + (vii)) + (viii)) + (ix)) Credits Thanks to the Dominic Grün, Yi-Lu Wang, and Nikolaus Rajewsky for providing this annotation. More detailed information about individual predictions, including links to other databases, can be found on the PicTar website, a project of the Rajewsky lab while at the New York University Center for Comparative Functional Genomics. References Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M et al. Combinatorial microRNA target predictions. Nat Genet. 2005 May;37(5):495-500. PMID: 15806104 Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW et al. A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol. 2006 Mar 7;16(5):460-71. PMID: 16458514 picTarMiRNAChicken PicTar 13 Species PicTar microRNA sites, 13 species conservation constraint: Mouse/Rat/Rabbit/Human/Chimp/Macaque/Dog/Cow/Armadillo/Elephant/Tenrec/Opossum/Chicken Expression and Regulation picTarMiRNADog PicTar 7 Species PicTar microRNA sites, 7 species conservation constraint: Mouse/Rat/Rabbit/Human/Chimp/Macaque/Dog Expression and Regulation chainNetRn3 Rat Chain/Net Rat (June 2003 (Baylor 3.1/rn3)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (June 2003 (Baylor 3.1/rn3)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the June 2003 (Baylor 3.1/rn3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRn3Viewnet Net Rat (June 2003 (Baylor 3.1/rn3)), Chain and Net Alignments Comparative Genomics netRn3 Rat Net Rat (June 2003 (Baylor 3.1/rn3)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (June 2003 (Baylor 3.1/rn3)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the June 2003 (Baylor 3.1/rn3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRn3Viewchain Chain Rat (June 2003 (Baylor 3.1/rn3)), Chain and Net Alignments Comparative Genomics chainRn3 Rat Chain Rat (June 2003 (Baylor 3.1/rn3)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (June 2003 (Baylor 3.1/rn3)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the June 2003 (Baylor 3.1/rn3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPanTro1 Chimp Chain/Net Chimp (Nov. 2003 (CGSC 1.1/panTro1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chimp (Nov. 2003 (CGSC 1.1/panTro1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chimp and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chimp assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chimp/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chimp sequence used in this annotation is from the Nov. 2003 (CGSC 1.1/panTro1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chimp/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chimp chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPanTro1Viewnet Net Chimp (Nov. 2003 (CGSC 1.1/panTro1)), Chain and Net Alignments Comparative Genomics netPanTro1 Chimp Net Chimp (Nov. 2003 (CGSC 1.1/panTro1)) Alignment net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chimp (Nov. 2003 (CGSC 1.1/panTro1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chimp and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chimp assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chimp/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chimp sequence used in this annotation is from the Nov. 2003 (CGSC 1.1/panTro1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chimp/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chimp chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPanTro1Viewchain Chain Chimp (Nov. 2003 (CGSC 1.1/panTro1)), Chain and Net Alignments Comparative Genomics chainPanTro1 Chimp Chain Chimp (Nov. 2003 (CGSC 1.1/panTro1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chimp (Nov. 2003 (CGSC 1.1/panTro1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chimp and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chimp assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chimp/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chimp sequence used in this annotation is from the Nov. 2003 (CGSC 1.1/panTro1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chimp/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chimp chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetHg18 Human Chain/Net Human (Mar. 2006 (NCBI36/hg18)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of human (Mar. 2006 (NCBI36/hg18)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both human and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the human assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best human/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The human sequence used in this annotation is from the Mar. 2006 (NCBI36/hg18) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the human/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single human chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetHg18Viewnet Net Human (Mar. 2006 (NCBI36/hg18)), Chain and Net Alignments Comparative Genomics netHg18 Human Net Human (Mar. 2006 (NCBI36/hg18)) Alignment net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of human (Mar. 2006 (NCBI36/hg18)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both human and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the human assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best human/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The human sequence used in this annotation is from the Mar. 2006 (NCBI36/hg18) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the human/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single human chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetHg18Viewchain Chain Human (Mar. 2006 (NCBI36/hg18)), Chain and Net Alignments Comparative Genomics chainHg18 Human Chain Human (Mar. 2006 (NCBI36/hg18)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of human (Mar. 2006 (NCBI36/hg18)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both human and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the human assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best human/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The human sequence used in this annotation is from the Mar. 2006 (NCBI36/hg18) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the human/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single human chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCanFam2 Dog Chain/Net Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of dog (May 2005 (Broad/canFam2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both dog and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the dog assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best dog/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The dog sequence used in this annotation is from the May 2005 (Broad/canFam2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the dog/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single dog chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCanFam2Viewnet Net Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments Comparative Genomics netCanFam2 Dog Net Dog (May 2005 (Broad/canFam2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of dog (May 2005 (Broad/canFam2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both dog and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the dog assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best dog/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The dog sequence used in this annotation is from the May 2005 (Broad/canFam2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the dog/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single dog chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCanFam2Viewchain Chain Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments Comparative Genomics chainCanFam2 Dog Chain Dog (May 2005 (Broad/canFam2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of dog (May 2005 (Broad/canFam2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both dog and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the dog assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best dog/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The dog sequence used in this annotation is from the May 2005 (Broad/canFam2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the dog/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single dog chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBosTau2 Cow Chain/Net Cow (Mar. 2005 (Baylor 2.0/bosTau2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cow (Mar. 2005 (Baylor 2.0/bosTau2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cow and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cow assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cow/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cow sequence used in this annotation is from the Mar. 2005 (Baylor 2.0/bosTau2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cow/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cow chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBosTau2Viewnet Net Cow (Mar. 2005 (Baylor 2.0/bosTau2)), Chain and Net Alignments Comparative Genomics netBosTau2 Cow Net Cow (Mar. 2005 (Baylor 2.0/bosTau2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cow (Mar. 2005 (Baylor 2.0/bosTau2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cow and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cow assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cow/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cow sequence used in this annotation is from the Mar. 2005 (Baylor 2.0/bosTau2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cow/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cow chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBosTau2Viewchain Chain Cow (Mar. 2005 (Baylor 2.0/bosTau2)), Chain and Net Alignments Comparative Genomics chainBosTau2 Cow Chain Cow (Mar. 2005 (Baylor 2.0/bosTau2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cow (Mar. 2005 (Baylor 2.0/bosTau2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cow and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cow assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cow/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cow sequence used in this annotation is from the Mar. 2005 (Baylor 2.0/bosTau2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cow/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cow chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGalGal2 Chicken Chain/Net Chicken (Feb. 2004 (WUGSC 1.0/galGal2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chicken (Feb. 2004 (WUGSC 1.0/galGal2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chicken and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chicken assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chicken/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chicken sequence used in this annotation is from the Feb. 2004 (WUGSC 1.0/galGal2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chicken/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chicken chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGalGal2Viewnet Net Chicken (Feb. 2004 (WUGSC 1.0/galGal2)), Chain and Net Alignments Comparative Genomics netGalGal2 Chicken Net Chicken (Feb. 2004 (WUGSC 1.0/galGal2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chicken (Feb. 2004 (WUGSC 1.0/galGal2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chicken and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chicken assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chicken/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chicken sequence used in this annotation is from the Feb. 2004 (WUGSC 1.0/galGal2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chicken/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chicken chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGalGal2Viewchain Chain Chicken (Feb. 2004 (WUGSC 1.0/galGal2)), Chain and Net Alignments Comparative Genomics chainGalGal2 Chicken Chain Chicken (Feb. 2004 (WUGSC 1.0/galGal2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chicken (Feb. 2004 (WUGSC 1.0/galGal2)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chicken and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chicken assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chicken/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chicken sequence used in this annotation is from the Feb. 2004 (WUGSC 1.0/galGal2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chicken/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chicken chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro1 X. tropicalis Chain/Net X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Oct. 2004 (JGI 3.0/xenTro1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro1Viewnet Net X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)), Chain and Net Alignments Comparative Genomics netXenTro1 X. tropicalis Net X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Oct. 2004 (JGI 3.0/xenTro1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro1Viewchain Chain X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)), Chain and Net Alignments Comparative Genomics chainXenTro1 X. tropicalis Chain X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Oct. 2004 (JGI 3.0/xenTro1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Oct. 2004 (JGI 3.0/xenTro1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer3 Zebrafish Chain/Net Zebrafish (May 2005 (Zv5/danRer3)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (May 2005 (Zv5/danRer3)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the May 2005 (Zv5/danRer3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer3Viewnet Net Zebrafish (May 2005 (Zv5/danRer3)), Chain and Net Alignments Comparative Genomics netDanRer3 Zebrafish Net Zebrafish (May 2005 (Zv5/danRer3)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (May 2005 (Zv5/danRer3)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the May 2005 (Zv5/danRer3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer3Viewchain Chain Zebrafish (May 2005 (Zv5/danRer3)), Chain and Net Alignments Comparative Genomics chainDanRer3 Zebrafish Chain Zebrafish (May 2005 (Zv5/danRer3)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (May 2005 (Zv5/danRer3)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the May 2005 (Zv5/danRer3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFr1 Fugu Chain/Net Fugu (Aug. 2002 (JGI 3.0/fr1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of fugu (Aug. 2002 (JGI 3.0/fr1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both fugu and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the fugu assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best fugu/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The fugu sequence used in this annotation is from the Aug. 2002 (JGI 3.0/fr1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the fugu/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single fugu chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFr1Viewnet Net Fugu (Aug. 2002 (JGI 3.0/fr1)), Chain and Net Alignments Comparative Genomics netFr1 Fugu Net Fugu (Aug. 2002 (JGI 3.0/fr1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of fugu (Aug. 2002 (JGI 3.0/fr1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both fugu and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the fugu assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best fugu/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The fugu sequence used in this annotation is from the Aug. 2002 (JGI 3.0/fr1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the fugu/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single fugu chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFr1Viewchain Chain Fugu (Aug. 2002 (JGI 3.0/fr1)), Chain and Net Alignments Comparative Genomics chainFr1 Fugu Chain Fugu (Aug. 2002 (JGI 3.0/fr1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of fugu (Aug. 2002 (JGI 3.0/fr1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both fugu and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the fugu assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best fugu/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The fugu sequence used in this annotation is from the Aug. 2002 (JGI 3.0/fr1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the fugu/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single fugu chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTetNig1 Tetraodon Chain/Net Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tetraodon and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tetraodon assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tetraodon/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tetraodon sequence used in this annotation is from the Feb. 2004 (Genoscope 7/tetNig1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tetraodon/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tetraodon chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTetNig1Viewnet Net Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments Comparative Genomics netTetNig1 Tetraodon Net Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tetraodon and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tetraodon assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tetraodon/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tetraodon sequence used in this annotation is from the Feb. 2004 (Genoscope 7/tetNig1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tetraodon/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tetraodon chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTetNig1Viewchain Chain Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments Comparative Genomics chainTetNig1 Tetraodon Chain Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) to the mouse genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tetraodon and mouse simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tetraodon assembly or an insertion in the mouse assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the mouse genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tetraodon/mouse chain for every part of the mouse genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tetraodon sequence used in this annotation is from the Feb. 2004 (Genoscope 7/tetNig1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tetraodon/mouse split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tetraodon chromosome and a single mouse chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used:  ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961