Zebrafish
Danio rerio
Photo courtesy of NHGRI (Press Photos)

The July 2007 zebrafish (Danio rerio) Zv7 assembly was produced by The Wellcome Trust Sanger Institute in collaboration with the Max Planck Institute for Developmental Biology in Tuebingen, Germany, and the Netherlands Institute for Developmental Biology (Hubrecht Laboratory), Utrecht, The Netherlands. For more information about this assembly, see Zv7 in the NCBI Assembly database.

Sample position queries

A genome position can be specified by the accession number of a sequenced genomic region, an mRNA or EST, a chromosomal coordinate range, or keywords from the GenBank description of an mRNA. The following list shows examples of valid position queries for the zebrafish genome. Note that some position queries (e.g. "huntington") may return matches to the mRNA records of other species. In these cases, the mRNAs are mapped to their homologs in zebrafish. See the User's Guide for more information.

Request:
   Genome Browser Response:

chr1   Displays all of chromosome 1
chr1:1-200000   Displays first two hundred thousand bases of chromosome 1
chr1:100000+2000 Displays a region of chr 1 that spans 2000 bases, starting with position 100000

U30710   Displays region containing zebrafish mRNA with GenBank accession number U30710
AA658622   Displays region containing zebrafish EST with GenBank accession AA658622
ENSDART00000025573   Displays region containing Ensembl gene prediction transcript ENSDART00000025573

p53   Lists mRNAs related to the p53 tumor suppressor
pseudogene mRNA   Lists transcribed pseudogenes but not cDNAs, in GenBank
homeobox caudal   Lists mRNAs for caudal homeobox genes in GenBank
zinc finger   Lists many zinc finger mRNAs
kruppel zinc finger   Lists only kruppel-like zinc fingers
huntington   Lists mRNAs associated with Huntington's disease

porter   Lists mRNAs deposited by scientists named Porter
Amsterdam,A.   Lists mRNAs deposited by co-author A. Amsterdam

Use this last format for author queries. Although GenBank requires the search format Amsterdam A, internally it uses the format Amsterdam,A.

Assembly details

The Zv7 assembly consists of 1,440,582,308 bp in 5,036 fragments (N50 = 1,153,933, n = 277). This assembly was constructed using two different strategies. One approach used traditional clone mapping and sequencing techniques to produce a higher-quality genome sequence. BAC libraries were selected and fingerprinted to generate a fingerprint contig (FPC) map (data freeze 11 April 2007). From this map, a tiling path was calculated that covered the genome sequence clone-by-clone. Clones from this path were selected for high-quality sequencing and then pieced together to form the genome sequence. The other strategy involved a whole genome shotgun (WGS) approach. The WGS assembly, of lower quality sequence, was used to fill gaps in the tiling path. It was generated using 13,756,367 reads from a single Tuebingen doubled haploid zebrafish. The reads comprised 10,891,216,277 bp and were clustered using Phusion, resulting in a coverage of 5.5x.

Clone sequences and WGS contigs were integrated by considering sequence alignments, BAC end placements and zebrafish cDNAs and markers. Improvements to the integration algorithm allowed the placement of the WGS contigs that contained markers, but could not be linked to the FPC contigs. In cases where markers from different chromosomes appeared on the same contig, priority was given to the Heat Shock Diploid Cross Panel (HS) and the Boston MGH Cross Map (MGH). Some of these discrepancies may be attributed to misassemblies, but there may also be inconsistencies between the zebrafish marker panels.

The integrated assembly contains 1.02 Gb from 7,823 sequenced clones, (7,139 finished and 684 unfinished). 89% of the sequence -- 1,277,075,233 bp including estimated gap sizes and 100 bp gaps between scaffolds -- lies in scaffolds that are placed on chromosomes 1-25 (linkage groups 1-25) after integration of the WGS assembly with the clone sequences. Integration was achieved by considering a combination of sequence alignment and BAC end positions, as well as features such as zebrafish cDNAs and markers. 45,800,611 bp of the sequence are in 166 scaffolds tied to unplaced FPC contigs; 117,689,868 bp are in 4,844 NA scaffolds, which are unplaced WGS contigs. The zebrafish mitochondrial sequence is also available as the virtual chromosome "chrM".

The Sanger Institute notes this assembly release is still preliminary. Highly variable regions within the genome posed assembly difficulties, most likely because the sequences originated from different haplotypes. This also results in assembly dropouts and false duplications. While generating this assembly, special attention was paid to such issues and more than 200 Mb of duplicated sequence was removed in comparison to the Zv6 assembly (UCSC danRer4). For more information about this Zv7 assembly, see the Sanger Institute page for the Danio rerio Sequencing Project.

Downloads of the Zebrafish data and annotations can be obtained from the UCSC FTP site or Downloads page. This data set has specific conditions for use. The danRer5 annotation tracks were generated by UCSC and collaborators worldwide. Special thanks to the Zebrafish Genome Initiative at Children's Hospital in Boston, MA, USA for their collaboration on this release. See the Credits page for a detailed list of the organizations and individuals who contributed to this release.


GenBank Pipeline Details

For the purposes of the GenBank alignment pipeline, this assembly is considered to be: well-ordered.