/goldenPath/help/barChart.html:barChart_and_bigBarChart_Track_Format barChart and bigBarChart Track Format The barChart (and bigBarChart) track format displays a graph of category-specific values over genomic regions, similar to the GTEx Gene track. This format is useful for displaying gene expression and other datasets where it is desirable to compare a set of variables over genomic regions. While a barChart track can effectively show datasets with single values for each variable (e.g. comparing individual samples), the format provides specific features to display studies comprised of a large set of samples for each variable (e.g. comparing tissues with multiple samples for each tissue). In this usage, the main genome browser display presents a graph of summary values (e.g. medians) for each variable, and the distribution of sample values across variables is shown via a boxplot graph shown on the details page for each region. The barChart format is available as a standalone plain text bed6+ format for use with smaller datasets as a custom track, and as a binary indexed format (bigBarChart) suitable for track hubs and custom tracks. The bigBarChart format provides more track customization features (i.e. schema customization, and label configuration support), and is recommended for users who can use command-line tools and have web-accessible data storage. If you do not have web-accessible data storage, please see the Hosting section of the Track Hub Help documentation. barChart format files are converted to bigBarChart files using the program bedToBigBed, run with the -as option to pull in a special autoSql (.as) schema file defining the fields of the bigBarChart. Below is an example of the barChart format in 'full' visibility mode [BarChart example in full mode] The 'squish' display mode draws one colored rectangle indicating the category (e.g. tissue) with highest value of the measured metric (e.g. gene expression) if it contributes more than 10% to the total expression, otherwise the chart is colored black. The following image shows the GTEx Genes track in 'squish' mode; the beige colored item (tissue) has the highest expression in the ACE2 gene and represents more than 10% of total expression. Click into the colored rectange for more information. [BarChart example in squish mode] barChart format definition The following autoSql definition illustrates the basic schema supporting barChart (and bigBarChart) tracks. table bigBarChart "bigBarChart bar graph display" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string name; "Name or ID of item" uint score; "Score (0-1000)" char[1] strand; "'+','-' or '.'. Indicates whether the query aligns to the + or - strand on the reference" string name2; "Alternate name of item" uint expCount; "Number of bar graphs in display, must be <= 100" float[expCount] expScores; "Comma separated list of category values." bigint _dataOffset; "Offset of sample data in data matrix file, for boxplot on details page, optional only for barChart format" int _dataLen; "Length of sample data row in data matrix file, optional only for barChart format" ) Column Explanations The first 6 fields of the barChart format are the same as the first 6 fields of the standard BED format. The name2 field provides an alternate item name, useful if you would like to associate multiple transcripts to a single gene locus, different variables to the same experiment type, etc. The expCount and expScores fields are used as in the Microarray format; they define the number of categories and a value for each category (see example #1 below). The _dataOffset and _dataLen fields are used internally by the track to locate sample values for a region in an optional matrix file containing all sample values. These values are used to draw a boxplot of all sample data on the details page for the bar chart. When a matrix file is not supplied, these fields should be set to 0. (As a convenience, these fields are optional for barChart custom tracks). When creating bigBarChart files, we encourage you to customize the title and field descriptions of the prototype autoSql schema to better describe your data. In the example below, the name field of the track refers to a transcript, while the name2 field represents a gene: table xyzGeneExpression "XYZ gene expression barChart" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string name; "Transcript name" uint score; "Score (0-1000), derived from total expScores (below)" char[1] strand; "+, -, or ., indicating orientation of the item" string name2; "Gene name" uint expCount; "Number of tissues" float[expCount] expScores; "Comma separated list of median expression in RPKM for each tissue." bigint _dataOffset; "Offset of sample data in data matrix file" int _dataLen; "Length of sample data row in data matrix file" ) Customing this file will make your data more easily interpreted by users, who will see the field descriptions when accessing the track data from the Table Browser, when viewing items on the Genome Browser details pages (via the "view table schema" link), and (for users who download files), from the -as option of the bigBedInfo tool. Creating barChart and bigBarChart custom tracks The steps for creating barChart tracks differ from the process for creating bigBarChart tracks. The steps also differ based on whether you have an input matrix file (generated perhaps from an RNA-Seq differential expression analysis pipeline) or not. If you have an expression matrix-like file, skip to Example #3, otherwise follow example 1 below. Example #1 In this example, you will create a barChart custom track using example bed6+3 data. 1. Paste the following track line into the custom track management page for the human assembly hg38. track type=barChart name="barChart Example One" description="A barChart file" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" visibility=pack browser position chr14:95,081,796-95,436,280 # chrom chromStart chromEnd string name score strand name2 expCount expScores _dataOffset _dataLen chr14 95086227 95158010 DICER1 999 - ENSG00000100697.10 5 2.94,11.60,38.00,6.69,4.89 chr14 95181939 95319906 CLMN 999 - ENSG00000165959.7 5 7.08,69.53,9.32,1.38,1.68 chr14 95417493 95475836 SYNE3 999 - ENSG00000176438.8 5 7.29,3.73,0.74,20.35,1.39 2. Click the "submit" button. After the file loads in the Genome Browser, you should see an automatically colored bar graph with 5 bars. Hovering the mouse over any of the individual bars will display the name of the particular bar ("wholeBlood", "adiposeSubcut", ...) as well as the value associated with that bar (10.94, 0.74, ...). The order of bar names in the barChartBars field of track line should exactly match the order of the values in the expScores field. Example #2 In this example, you will create a bigBarChart track out of an existing bigBarChart format file, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly. To create a custom track using this file: 1. Construct a track line referencing the file: track type=bigBarChart name="bigBarChart Example One" description="A bigBarChart file" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb browser position chr14:95,081,796-95,436,280 2. Paste the track line into the custom track management page for the human assembly hg38. 3. Click the "submit" button. After the file loads in the Genome Browser, you should see an automatically colored bar graph with 5 bars. The same rules apply to bigBarChart custom tracks as barChart custom tracks in that the order of names in the barChartBars field should exactly match the order of values from the expScores field in the bigBarChart file. Example #3 In this example, you will use the helper scripts expMatrixToBarchartBed and bedJoinTabOffset on example matrix and category files in order to generate a bed6+5 barChart format file, which can be loaded as a custom track into the Genome Browser. The matrix file is a tab-separated (must be tabs, not spaces) file of the following form, perhaps resulting from an RNA-Seq analysis pipeline. Please note that the first line must describe each column as in the example snippet below. transcript sample1 sample2 sample3 sample4 sample5 ... transcriptName value1 value2 value3 value4 value5 ... The categories file then provides more meta information about this matrix file. It is a two column, tab-separated file that maps the samples in the matrix file to a specific category: sample1 category1 sample2 category1 ... ... sampleA category2 sampleB category2 ... ... sampleX category3 sampleY category3 ... ... Each column in the first line of the matrix file must be found in the categories file. We have provided an example category file and matrix file to follow along with the rest of this example. To create a custom track in this form, follow the below steps: 1. Create a bed6+1 file to use as a map for the items in your matrix (does not need to be all-inclusive). This file, with example lines such as: chr14 95086227 95158010 ENSG00000100697.10 999 - DICER1 will be one of three input files to the helper script expMatrixToBarchartBed. For an example file to follow along with the rest of this example, you can download an example bed6+1 file here. 2. Download the helper programs expMatrixToBarchartBed and bedJoinTabOffset from the utilities directory appropriate for your operating system. 3. Make sure the programs downloaded above are in your system's PATH variable. For example, if you downloaded the programs to your $HOME/Downloads directory, set your PATH variable accordingly: export PATH=$PATH:$HOME/Downloads 4. Now run expMatrixToBarchartBed (which in turn runs bedJoinTabOffset) like so: expMatrixToBarchartBed categoriesFile matrixFile bedInputFile outputBed The argument outputBed will be a bed6+5 file, with the expCount, expScores, _dataOffset, and _dataLen fields computed for you, for example: chr14 95086227 95158010 ENSG00000100697.10 999 - DICER1 5 10.94,11.60,8.00,6.69,4.89 93153 26789 expMatrixToBarchartBed will also output the order of the scores in the expScores field, which you can then copy and paste into the barChartBars field of the custom track line so the bars displayed in the browser match the right values: The columns and order of the groups are: #chr start end name score strand name2 expCount expScores;adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood _offset _lineLength If you have already pre-computed expCount and expScores, and just need offsets into your matrix file for a more descriptive details page, run only bedJoinTabOffset like so: bedJoinTabOffset matrixFile exampleBed6+3 outBed 5. Now that we have a bed6+5 format file that corresponds to the barChart format, we can construct a track line and prepend it to our bed6+5 file so the Genome Browser will recognize it: track type=barChart name="barChart Example" description="A barChart file" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" barChartMetric=median" visibility=pack 6. Use the upload button on the custom track management page for the human assembly hg38 to upload the file just created. 7. Click the "submit" button. expMatrixToBarchartBed automatically computes the median values for all the samples in the matrix file, which is useful when your experiment contains data from 8000 samples (such as the GTEx data). Furthermore, expMatrixToBarchartBed can compute the mean value of all samples in a category of the matrix file (instead of the default median), in addition to allowing for a specific ordering of the expScores field. NOTE: Set the barChartMetric setting to 'mean' if you use this option of expMatrixToBarchartBed. For more information about expMatrixToBarchartBed or bedJoinTabOffset, run the program with no arguments to get a usage message. Example #4 In this example, you will use the bed6+5 file created in Example 3 to create a bigBarChart file, allowing the data to be remotely accessed and exist within a track hub. The track settings for bigBarChart on a hub can be viewed here. 1. If not already completed, follow steps 1-4 from Example 3 above, or download the example bed6+5 file here 2. Download the fetchChromSizes and bedToBigBed programs from the utilities directory appropriate to your operating system. 3. Use fetchChromSizes to create a chrom.sizes file for the UCSC database you are working with (hg38 for these examples). Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38 database is located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes. 4. Save the autoSql file barChartBed.as to your computer. 5. Run bedToBigBed to create the bigBarChart file: bedToBigBed -as=barChartBed.as -type=bed6+5 inputBed hg38.chrom.sizes output.bigBed 6. Move the newly constructed bigBarChart file to a web accessible http, https, or ftp location. 7. Construct a custom track line with a bigDataUrl parameter pointing to the newly created bigBarChart file. If the matrix and category files used to make the precursor barChart file are also moved to an http, https, or ftp location, we can point to them on the custom track line as well (all settings must be on the same line): track type=bigBarChart name="bigBarChart Example One" description="A bigBarChart file" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" barChartMetric=median barChartUnit=RPKM bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb barChartMatrixUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleMatrix.txt barChartSampleUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleSampleData.txt visibility=pack 8. To fully take advantage of creating a bigBarChart file and the barChartMatrixUrl and barChartSampleUrl supporting files, create a Track Hub and use a stanza such as the following: track exampleBarChartTrack type bigBarChart visibility full shortLabel exBarChart longLabel Simple example bar chart track barChartBars adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood barChartColors #FF6600 #33CCCC #CC9955 #AAAAFF #FF00BB barChartLabel Tissues barChartMetric median barChartUnit RPKM bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb barChartMatrixUrl http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleMatrix.txt barChartSampleUrl http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleSampleData.txt Please note, the fields in your barChartBars line must match the terms in your categories file (exampleBarChartSamples.txt) in order for the boxplot display to show up on the details page for tracks. Below is an example image indicating the benefit of using these files in a hub, note the "View all data points for..." link that allows extracting data from the matrix file (exampleBarChartMatrix.txt) specific for this named item. [Example boxPlot image] Example #5 To help Track Hub Developers adjust the display of tracks we add two settings barChartBarMinWidth and barChartBarMinPadding. The first sets the minimum pixel width of the bars in the chart to a number of pixels, for example barChartBarMinWidth 10. The second sets the minimum pixel width between bars to a number of pixels, for example barChartBarMinPadding 5. Here are two example tracks using these settings on the same source data that can be loaded by going to the My Data, Custom Tracks page and pasting the below text to see how the display differs. browser position chr14:95,081,796-95,436,280 track type=bigBarChart barChartBarMinPadding=5 name="ex barChartBarMinPadding" description="A bigBarChart file with barChartBarMinPadding" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb track type=bigBarChart barChartBarMinWidth=20 name="ex barChartBarMinWidth" description="A bigBarChart file with barChartBarMinWidth" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb [Example1 of barChartBarMinPadding and barChartBarMinWidth] Both tracks have the same data, however, in the bottom track the barChartBarMinWidth 20 setting triggers wider widths, and the top track has larger padding between bars from the setting barChartBarMinPadding 5. As described in the settings entries for barChartBarMinWidth and barChartBarMinPadding there is a dynamic calculation dependent on the current window size, the width of the item, and the number of bars for the item. So that when zooming in the appearance of the barCharts with these settings can be different, at different scales. For instance, in the first image, you can see how much impact barChartBarMinWidth has on the second track, as well as the barChartBarMinPadding in the top track. But as zoomed in, with the below image, the impact of both of these settings is less noticeable. [Example2 of barChartBarMinPadding and barChartBarMinWidth] Example #6 To help with the selection and exploration of large data sets the new settings barChartFacets, barChartStatsUrl, and barChartMerge were introduced where on the details page checkboxes enable slicing data down to smaller collections based on metadata. The setting barChartFacets turns on the faceted selection on the track details and configure page which is useful for selecting which bars out of a large number to display by clicking designated checkboxes. The setting barChartStatsUrl <url> associates a table in tab-separated values with the barChart, with one line per bar. And the setting barChartMerge on enables a merge button inside of the faceted selections. It is particularly useful when there are many bars and many facets to condense a related group, such as tissue source. Below is an example track using these settings on source data for a Tabula Sapiens single cell RNA data from many tissues track. This excerpt of settings from that track allows experimenting to see these settings in action, and to be loaded by going to the My Data, Custom Tracks page. track type=bigBarChart name="ex Tabula Sapiens" description="A bigBarChart using Tabula Sapiens data to illustrate new Details pages" visibility=pack barChartCategoryUrl=http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/tabulaSapiens/bw_edit_tissue_cell_type.categories barChartFacets=tissue,cell_class,cell_type barChartStatsUrl=http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/tabulaSapiens/bw_edit_tissue_cell_type.facets barChartMerge=on bigDataUrl=http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/tabulaSapiens/tissue_cell_type.bb Once loaded, click into an item to see the details page, in this case for the gene ACE2 at the default position in hg38. On the details page, rather than a static bar chart image, there is a dynamic interactive selection screen with checkbox facets to narrow down the display. Adding barChartMerge on enables the display of the "merge" button, and barChartFacets tissue,cell_class,cell_type sources information in barChartStatsUrl ...tissue_cell_type.facets to enable the facet options. To interact with this example, click the first two "merge" buttons next to "tissue" and "cell_class." [Example1 of Facets on barCharts] With those two merged selections, then click on the "Macrophage" option to see just this one cell type selection. [Example2git l of Facets on barCharts] By then clicking the "unmerge" button next to "tissue" the single bar chart will expand with tissue clusters. [Example3 of Facets on barCharts] In these ways the new barChartFacets, barChartStatsUrl, and barChartMerge settings allow users to explore the barChart data on the individual details page more closely. One can use the facets to further select certain types and also click the columns (val/count/cluster) to arrange by numerical value or alphabetical name. Also, if you click the "Return to Genome Browser" link, you will see only these selection bars are displayed. [Example4 of Facets on barCharts] In this image after making the selections browsing ACE2 the "zoom out" button has been clicked to also view nearby genes where the expression of these tissue selections for the gene PIR is quite noticeably different. Example #7 In this example, we will be using command-line tools that were used to create the single-cell tracks available on hg38. For more in-depth examples of these tools, take a look at the following makedoc for a real-life example. The matrixClusterColumns command converts a single cell gene expression matrix to a cell-type gene expression matrix. It takes a cell-by-cell metadata matrix that refers to the same cells as a gene expression matrix and combines the gene expression values for all cells of a given type into a single value representing the cell type. It can also be used on other metadata fields to produce matrices that show mean or average gene expression levels for a donor, an organ, or any other metadata field or combination of fields. The following command uses the exprMatrix.tsv and meta.tsv files to create six files: prepMatrix.tsv, prepStats.tsv, TissueCompMatrix.tsv, TissueCompStats.tsv, SexMatrix.tsv, and SexStats.tsv. matrixClusterColumns exprMatrix.tsv meta.tsv \ prep prepMatrix.tsv prepStats.tsv \ "Tissue Composition" TissueCompMatrix.tsv TissueCompStats.tsv \ Sex SexMatrix.tsv SexStats.tsv Read 5 rows from meta.tsv matrix exprMatrix.tsv has 209126 fields 209126 total columns, 209121 unclustered, 0 misses 209126 total columns, 209121 unclustered, 0 misses 209126 total columns, 209121 unclustered, 0 misses . If you are not sure which GENCODE Genes version is best suited for your data, the gencodeVersionForGenes command takes a list of gene symbols or gene accessions and searches for the version of GENCODE or RefSeq that matches the most genes in the list. Optionally, the tool can produce a BED file containing the gene structures for the genes in the list. The following command uses the gene.lst and geneSymVerTx.tsv files to create a mapping.bed file that will be used in the next step. # Figure out gene set gencodeVersionForGenes -target=hg38 gene.lst geneSymVerTx.tsv -bed=mapping.bed examining 23 versions of gencode and refseq best is gencodeVM5 as sym on mm10 with 6 of 6 (100%) hits on hg38 6 of 6 (100%) hit across versions The matrixToBarChartBed combines an expression matrix and a BED file with gene structures (mapping.bed) to make a new BED file (myTissueComp.bed) with a barChart showing gene expression that can be viewed on the Genome Browser. The optional argument, stats=stats.tsv provides a statistics file and improves the coloring in trackDb. The following command uses the TissueCompMatrix.tsv, mapping.bed, and TissueCompStats.tsv to create the myTissueComp.bed file that can be viewed on the Genome Browser or converted into a bigBed file so it can be used inside of a track hub. matrixToBarChartBed TissueCompMatrix.tsv mapping.bed myTissueComp.bed -stats=TissueCompStats.tsv 5 genes found, 0 (0.00%) missed The simpleBarChartBed.as can be used with the bedToBigBed command to create the bigBarChart file. bedToBigBed myTissueComp.bed hg38.chrom.sizes myTissueComp.bb -type=bed6+3 -as=simpleBarChartBed.as pass1 - making usageList (1 chroms): 15 millis pass2 - checking and writing primary data (5 records, 9 fields): 1 millis Sharing your data with others If you would like to share your barChart/bigBarChart data track with a colleague, learn how to create a URL by looking at Example 6 on this page. Extracting data from the bigBarChart format Because bigBarChart files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory. - bigBedToBed — converts a bigBed file to ASCII BED format. - bigBedSummary — extracts summary information from a bigBed file. - bigBedInfo — prints out information about a bigBed file. Use the -as option to see the file field descriptions. As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement. Troubleshooting If you encounter an error when you run the bedToBigBed program, check your input file for data coordinates that extend past the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input file before running the bedToBigBed program.
/goldenPath/help/phastCons.html:Genome_Browser_phastCons_Format phastCons File Format phastCons data files contain the compressed conservation scores that underlie the Conservation annotation track and the phastCons table. For a detailed description of the algorithm used to produce the scores, see the Genome Browser description page associated with the Conservation track. File Format (assemblies released Nov. 2004 and later) When uncompressed, the file contains a declaration line and one column of data in wiggle table fixed-step format: fixedStep chrom=scaffold_1 start=3462 step=1 0.0978 0.1588 0.1919 0.1948 0.1684 The declaration line specifies the starting point of the data in the assembly. It consists of the following fields: - fixedStep -- keyword indicating the wiggle track format used to write the data. In fixed step format, the data is single-column with a fixed interval between values. - chrom -- chromosome or scaffold on which first value is located. - start -- position of first value on chromosome or scaffold specified by chrom. NOTE: Unlike most Genome Browser coordinates, these are one-based. - step -- size of the interval (in bases) between values. A new declaration line is inserted in the file when the chrom value changes, when a gap is encountered (requiring a new start value), or when the step interval changes. Data lines The first data value below the header shows the score corresponding to the position specified in the header. Subsequent score values step along the assembly in one-base intervals. The score shows the posterior probability that the phylogenetic hidden Markov model (HMM) of phastCons is in its most-conserved state at that base position. ------------------------------------------------------------------------ File Format (assemblies prior to Nov. 2004) When uncompressed, the data file contains two columns: 294 0.0953 295 0.0948 296 0.0943 297 0.0936 298 0.0929 299 0.0921 Column #1 contains a one-based position coordinate. Column #2 contains a score showing the posterior probability that the phylogenetic hidden Markov model (HMM) of phastCons is in its most conserved state at that base position. ------------------------------------------------------------------------ References for phastCons Siepel A and Haussler D (2005). Phylogenetic hidden Markov models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, pp. 325-351, Springer, New York. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005). For a discussion of the methods used to calculate the phastCons scores, see the description page for the hg17 Conservation track in the Genome Browser.
/goldenPath/help/fileSearch.html:Genome_Browser_File_Search File Search The Genome Browser's File Search feature allows users to find downloadable ENCODE files of interest quickly and easily. Searching Downloadable ENCODE files can be found by entering terms of interest in the free-text Track Name and Description fields, by selecting the appropriate Group or Data Format from the drop-down menus, and/or by using the "ENCODE terms" drop-down filters. ENCODE terms: The filters in this section allow users to search (or refine their search) for files based on the metadata associated with the ENCODE downloadable files. By default, there are two rows of ENCODE metadata search criteria, but more can be added by clicking the "+" button. To remove an added metadata search row, click the "-" button. Each row is comprised of two menus. The first is a drop-down menu that contains the searchable metadata categories. Please note that because the metadata categories are assembly dependent, the options may vary from assembly to assembly.Some metadata categories have a link on the far right of the row that provides more information about that category. The second menu is usually a multi-select drop-down menu containing the the metadata terms available for the category selected (in the first drop-down). The default is usually set to "Any". Clicking on "Any" opens up the multi-select drop-down with the possible options. Multiple options can be selected by clicking multiple checkboxes. For a few of the metadata categories in the first drop-down menu, the second menu is a free-text field in which the desired term must be entered. Results After clicking the "Search" button, the files that meet the search criteria (up to a maximum of 1000 files) are displayed in a table, one file per row. If the displayed search results have been limited to the first 1000 files, a message stating this restriction will be displayed above the results. Results Header: The first column of the header contains the number of files found (up to a maximum of 1000 files) and the subsequent columns contain a title for the information displayed in that column. The list of files can be sorted by clicking on the header of the column. The files can be sorted on up to 5 attributes. Files: Click the "Download" button to download a file. Click the [] icon to go to the downloads page to view that file and all other files associated with the track.
/goldenPath/help/hgSessionHelp.html:Genome_Browser_Session_Help Sessions User's Guide Contents Introduction Some simple examples Creating a session Creating a session — video demonstration Session details Sharing a session Editing an existing session Displaying your own tracks in a session Deleting a session Lifespan of a session Session gallery Help for Nucleic Acids Research submitters ------------------------------------------------------------------------ Questions and feedback on this User's Guide are welcome. User questions and answers on Sessions and other topics are available in the Genome Browser mailing list. Introduction The Session tool allows you to configure your browser with specific track combinations, including custom tracks, and save the configuration options. Multiple sessions may be saved for future reference, for comparing different data sets, or for sharing with your colleagues. Saved sessions will not be expired, however we still recommend that you keep local back-ups of your session contents and any associated custom tracks. BLAT result tracks persist for at least 48 hours after the last time they are viewed. The creation date of a session can be viewed in the Session Management menu. This date only reflects the initial creation of the Session and is not updated when sessions are edited. Descriptive text can also be added to a session in the Session Details menu. This feature may be accessed via the Session link in the top blue navigation bar in any assembly. To ensure privacy and security, you must create an account and log in before using the session manager. Individual sessions may be designated as either shared or non-shared to protect the privacy of confidential data. To avoid having a new shared session from someone else override your existing Genome Browser settings, you are encouraged to open a new web-browser instance or to save existing settings in a session before loading a new shared session. Note that not all of the Genome Browser mirror sites have all of the session features enabled. This User's Guide provides a few examples that introduce the features of the Session tool, followed by detailed directions on creating, saving, modifying and sharing sessions. You may also wish to reivew two blog posts, How to share your UCSC screenthoughts and Sharing Data with Sessions and URLs for more discussions about sessions. Some simple examples This section contains some example sessions that demonstrate the use of the Session tool. To enable you to view these sessions, we have created a user account with the name Example. Example 1 This example shows the primate (chimp and rhesus) nets for chromosome 2 in the hg17 human assembly — the primate chromosome that fused in humans. We first configured our browser view with the desired settings, and then saved the session so that we could share it. We named our session hg17_chr2_primate. There are several ways for you to view this session: - Manually load and open the session. Open the Session tool. In the Session Management section under the Load Settings heading, enter this information: user: Example session name: hg17_chr2_primate Click the submit button next to the session name box to load the session. To view the session in the Genome Browser, click the Genome Browser link in the top blue navigation bar. - Open a session link sent by email. After we created and saved this session, we could have clicked the Email link to automatically send a message to one or more recipients with the following contents and clickable link: Here is a UCSC browser session I'd like to share with you: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Examples&hgS_otherUserSessionName=hg17_chr2_primate. By clicking this link, you can open the session in your browser. - Open a session from a local file. Alternatively, if we had saved the browser settings to a local file, we could have simply provided the location of that file for you to load into your browser to view our session. Click here to see such a settings file. This method works best when the file is in a location that you can access from your own computer or network. For this example, you can copy this file and paste it into a file on your own machine, then load it into the Session tool. - Open a session from a URL. Because you do not have access to our file system where this session file resides, it will be easier for you to load it using a URL. To do this, open the Session tool. In the Session Management section under the Load Settings header, enter the URL where this file is located: http://genome.ucsc.edu/goldenPath/help/examples/session_example1.txt Then, click the submit button to load the session settings. To view the session in the Genome Browser, press the Browser link in the Updated Session section. Example 2 This example shows the Human Accelerated Region (HAR1) in the hg18 assembly. Eighteen differences exist in a region of 118 bases between human and all other mammals extending back to the chicken. The two sessions in this example show the same browser position at two levels of detail: Example 2a is zoomed out; Example 2b is zoomed in. To view these sessions in your browser, you can use any of the methods described in Example 1: - Manually load and open the session. Example 2a: user: Example session name: hg18_HAR1 Example 2b: user: Example session name: hg18_HAR1_zoom - Open a session link sent by email. Example 2a: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit& hgS_otherUserName=Example&hgS_otherUserSessionName=hg18_HAR1 Example 2b: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit& hgS_otherUserName=Example&hgS_otherUserSessionName=hg18_HAR1_zoom. - Open a session from a local file. Example 2a: Copy the contents of this file to a file on your own machine, then load it into the Session tool. Example 2b: Copy the contents of this file to a file on your own machine, then load it into the Session tool. - Open a session from a URL. Example 2a: Paste this URL into the Session tool: http://genome.ucsc.edu/goldenPath/help/examples/session_example2a.txt Example 2b: Paste this URL into the Session tool: http://genome.ucsc.edu/goldenPath/help/examples/session_example2b.txt Creating a session It is easy to create a session to save or share. Simply configure the Genome Browser as you wish, then navigate to the Session tool by clicking on the My Data pulldown in the top blue navigation bar. Follow these steps to save your session: - Log in to the Genome Browser. To ensure privacy and security, you must create an account and/or log in to use the Session tool. You will not have to repeat the login step unless you sign off from the Session tool or close your Genome Browser. - Create a named session. Scroll down to the Save Settings section of the page. Type a name into the Save current settings as named session box. Choose whether or not you would like to share your sessions with others. If the allow this session to be loaded by others box is checked, anyone will be able to view your Genome Browser settings (including your custom tracks) if you provide them with your user and session name. Note that your session is not automatically available to the general public if you choose this option: you must provide the user and session name to other individuals for them to view it. This helps to ensure the confidentiality of your private data. After naming the session and choosing your sharing option, click the submit button. Your session will then be listed by name under My Sessions. - Save session settings to a file. Alternatively, you can create a file from your session settings that can be saved to your local machine or posted to a URL for access or sharing. To do this, go to the Save Settings section. Type a name into the Save current settings to a local file box. Click the submit button to save or display the file. The session will be saved in plain text (ascii) format by default. To select a compressed format, select one of the options from the file type returned menu before clicking submit . If you simply wish to preview the contents of the file in your browser window, leave the file name blank and click submit . How to back up text-based Custom Track data to a file - Save Custom Tracks. Save a backup of your current browser sessions's custom tracks to your local machine. This backup is intended to be used to restore uploaded text-based custom tracks that would otherwise be lost in case of an unexpected system failure at UCSC. Saving your data is easy and ensures that your hard work is safe and recoverable. To download your custom track data, navigate to the Save Settings section. Click the submit button to the right of "back up custom tracks archive .tar.gz". For each genome assembly, the custom track names will be shown along with individual and total file size. To proceed, click the create custom track backup archive button. All of the custom track data for the active session will be archived and compressed. Large custom tracks may take several minutes to finish. To download the archive to your computer, type in a name for the downloaded archive file and click the download backup archive button. The file will have a ".tar.gz" extension. Note that Safari browsers will unzip the archive leaving you with a .tar file in your Downloads directory. Click the return button to return to the Session Management page. To save viewing settings like track visibilities, highlighting, and sequence positions, use the "Save session settings to a local file" feature mentioned above. - Restore a Custom Track. The downloaded archive file cannot be directly reloaded into the system because it is compressed. To reload custom tracks, untar the downloaded .tar.gz file. The following example assumes you named your archive file 'someproject'. From the command-line, change into your Downloads directory, make a directory for your session, and uncompress your custom track archive. cd Downloads mkdir someproject cd someproject tar -xzf ../someproject.tar.gz If you use the Safari browser, your file will have been uncompressed automatically, leaving just .tar. If this is the case, use this command instead: tar -xf ../someproject.tar The assembly directory contains .ct files, one for each custom track. To restore a single custom track, go to the Custom Tracks tool, select the correct genome database, and click the add custom track button. If you already have existing custom tracks, click the add custom track button, then click the Choose File button next to the top window near the "Paste URLs or data" text. A file selection box will open. Navigate to the directory where the desired custom track .ct file is, choose it, and click the Submit button. If an optional help text or HTML page was saved with the custom track, you will see a corresponding custom track .html file. You can restore this by selecting the Choose File button near the "Optional track documentation" text. A file selection box will open. Navigate to the directory where the .html file is located, choose it, and click the Submit button. - Restore Multiple Custom Tracks. Restore all the custom tracks in an assembly directory by concatenating the .ct files together. From the command-line, change directory to your genome assembly and concatenate all of the .ct files into on hg38_all.ct file: cd hg38 cat *.ct > hg38_all.ct Compress the concatenated file for faster upload, creating hg38_all.ct.gz with the following command: gzip hg38_all.ct Navigate to the Custom Tracks tool and select the big concatenated compressed file (hg38_all.ct.gz in our example). Note that large custom track sets may require a long time to upload. Once uploaded, click the Go button on the next page to view your custom tracks in the Browser. You will still be able to save and share your Sessions online, which will include your Browser Settings and Custom Tracks. This feature is solely for backing up Custom Tracks as files. Creating a session — video demonstration [] Visit our Video Page. Visit our YouTube channel for more videos. Opening a saved session When you save a session, it is added to the My Sessions list on the Session page. Each session entry is listed by name and offers the following options to open, share, and manipulate it: - Session name. Click the session name to view it in the Genome Browser. - View/edit details. Click the details button to edit the session description and view session details such as creation date/time, assemblies with custom tracks and more. - Delete the session. Click the delete button to permanently remove this session from the list. - Share with others: Check this box to allow others to access this session. By default, this option is unchecked, which limits access to only the session owner. - Post in public listing: Check this box to add your session to the list of Public Sessions. Sessions in the listing will be available to be loaded and viewed by the world. - Email. Click this link to email this session to a colleague. Session details Each session has an associated details page that you can click into from the Session Management menu. The Session Details menu allows you to edit the Session Name, to add descriptive text and to change whether or not the session is shared with others. Like the Session Management menu, if you click "use" that session will be loaded as the current session and if you click "delete" the session will be deleted. The "Created on" date reflects the date that the session was originally created and will not be updated to reflect any edits. Sharing a session Shared vs. non-shared data When you create a session using the Session tool, you may designate it as either shared or non-shared. By default, new sessions are created as shared and must be explicitly changed to non-shared status. Shared sessions can be opened by other Genome Browser users to whom you've provided one of the following: - the user name and session name of the saved session - access privileges to a local file that contains the saved session information - the URL of a web-accessible session settings file Sessions which you've added to the list of Public Sessions will available to the world. Note that unless you've added them to this list of Public Sessions, your shared sessions will not be available in a general way to other Genome Browser users; they will need at least one of these access methods. If you choose to keep your session private, other users of the Genome Browser will not be able to access your data or browser configuration. Any confidential data or locations of interest that you are working with will be safe from viewing by others. The most secure way to control your session is to save the settings to a local file, then deny access to that file by others. Sharing your session with others There are five ways to let others know about your saved sessions: - Save the session URL. Immediately upon saving a new session, the top of the page offers a Browser hyperlink. Additionally, each session entry in the My Sessions list has a Browser hyperlink. Click either Browser link to open the Genome Browser with the session loaded. You can obtain the URL of the Genome Browser page by capturing the Browser hyperlink via right-click before you proceed to the Browser graphical view. You can then store the URL, create a bookmark or share the link with others. The UCSC site and supported mirrors display short session links similar to the following: http://genome.ucsc.edu/s/YourUserName/YourSessionName Previously, longer links were used: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=YourUserName&hgS_otherUserSessionName=YourSessionName System administrators of mirrors can introduce Apache redirects from the shorter links that will redirect to the longer links and enable the links to display for new sessions by setting hgSession.shortLink=on in the mirror's configuration files. Both links can be modified to include URL parameters, such as position=chr10:69,644,222-69,644,999 in these examples: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=view&hgS_otherUserSessionName=clinicalzoom&position=chr10:69,644,222-69,644,999 http://genome.ucsc.edu/s/view/clinicalzoom?position=chr10:69,644,222-69,644,999 - Email a session link. Each session entry in the My Sessions list also has an Email link. Click this link to automatically invoke your email tool with a message containing the Genome Browser URL, which you can then send to others. - Share a session settings file. If you have saved your settings to a local file, you can give others access to the file, or email the file to them as an attachment and instruct them to load it using the Session tool. - Share a web URL. If you have saved your settings to a file on a web server, you can provide a link like this to others: http://genome.ucsc.edu/cgi-bin/hgSession?hgS_doLoadUrl=submit&hgS_loadUrlName=MyUrl where MyUrl is the URL of your settings file, e.g., http://www.mysite.edu/~me/mySession.txt. In this type of link, you may replace "hgSession" with "hgTracks" to proceed directly to the Genome Browser. - List it on the Public Sessions page. The "Public Sessions" tool allows you to post and share your exciting and interesting Browser snapshots with the world. After having saved your session, you can add it to this public listing by checking the box in the column under "post in public listing?". People can then find your sesssion by entering a search term. You can even create a gallery of Public Sessions related to your search by using a unique string in your descriptions. Opening a shared session If you open a shared session while viewing the Genome Browser, it is possible to lose all of your own browser settings. That is, the settings for the newly-opened session will take precedence over your existing settings and will replace them. If you wish to preserve your original settings, you should first save your own settings as a session before opening a new session, or open a new tab or window in your internet browser before loading the new session. There are four ways to open a shared session, depending on what information you have about the session. The instructions below assume that you want to replace your current session the new session. Be sure to preserve your original session first if you don't want to overwrite it. - Open a session from an email link. If you receive an email message with a link to a colleague's shared session, simply click on the link to view the Genome Browser with the session settings. - Open another user's session. If you know the name of another user's shared session you can type in the user and session name in the "Restore Settings" section and click "submit" This will generate an "Updated Session" message and you can click on the Browser link to load the browser with the settings saved in this session. - Open a session from a settings file. Open the Session tool, then scroll down to Restore Settings in the Session Management section. Click Choose File to find the file on your computer. Click submit to display the Genome Browser using these session settings. - Open a session specified by a URL. Open the Session tool, then scroll down to Restore Settings in the Session Management section. Type in the URL in the Use settings from a URL box, then click submit to display the Genome Browser using the new session settings. Editing an existing session It's easy to make changes to an existing session. Reconfigure the Genome Browser as you wish, then resave the session with the same name. The Session tool will warn you that you are about to overwrite an existing session. You can also edit any descriptive text associated with your session as well as whether or not the session can be shared in the Session Details menu. Note that editing a session will not alter the creation date listed in the Session Management menu. If you previously shared this session with others, they will not see the changes until they reload your newly-edited session. Displaying your own tracks in a session In addition to displaying standard UCSC tracks in your session, you can also display the following user-generated tracks: - Custom Tracks - Genome Graph tracks Before you create and save your session, be sure to upload your Custom Track or Genome Graph track. These user-generated tracks associated with a saved session will not expire. BLAT results always have a lifespan of 48 hours, even if they are part of a session. However, if you generate a custom track from your BLAT results, they will be saved in your session. Deleting a session In the Session Management section under My Sessions, press the delete button next to the session name you would like to delete. This will permanently delete all details of the session from the UCSC server. Any saved links to that session will no longer work. No other user can delete your saved sessions, even if you have provided access to your sessions to them. Other users simply have a copy of your session. Unlike most other browser preferences, the session settings are not saved in your Genome Browser "cart". Therefore, if you choose to reset the Genome Browser, it will not delete your saved sessions. Lifespan of a session Your saved sessions will not be expired and will available you (and others if you share them) until you delete them. We have discontinued our previous policy of removing saved sessions and associated custom track data after four months. However, note that the UCSC Genome Browser is not a data storage service; please keep a local backup of your session contents and custom track data. Session gallery The Session Gallery is a collection of track views that help highlight viewing different topics in the browser. The sessions in the Session Gallery were created in the browser and then saved to a local file, which was then uploaded to an online location. This allows creating a single link, such as http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doLoadUrl=submit&hgS_loadUrlName=U, where U is the URL of the session file, e.g., http://www.mysite.edu/~me/mySession.txt, enabling users to maintain external control of the content file for easy update. Help for Nucleic Acids Research submitters The Nucleic Acids Research (NAR) journal now requires manuscript submissions to contain a private Session link to your data in the UCSC Genome Browser that allows reviewers to access data. These instructions will show you how to upload, view, and share your data. Viewing your data on the Genome Browser You can view your own private data by uploading your annotation files to the Genome Browser as custom tracks; visit our custom track help page to learn more. To summarize the steps to upload your data, you will need to: 1. Ensure the data file is formatted correctly. 2. Create a track line for your custom track. 3. Load the custom track by adding your track line to our Custom Tracks page. 4. View the data in the Genome Browser. Custom track examples Creating the track line may be the most challenging step since many configuration options exist. The track line begins with the track keyword, followed by one or more attribute=value pairs where the order of the attributes does not matter. Here are some examples: BAM custom track The simplest example of a BAM custom track is the following track line: track name="My BAM" type=bam bigDataUrl=http://www.mysite.edu/~me/my_sorted.bam In the example above, the name attribute defines the name of your custom track. The second attribute, type, is required for some data types, but not limited to: BAM, bigBed, WIG, bigWig, and VCF data types. The last attribute, bigDataUrl, is required for remotely hosted data types such as BAM, CRAM, bigBed, bigWig, and VCF. Adding more attribute=value pairs can further customize the display. Here is a custom track that uses the visibility and description atrributes: track type=bam visibility=dense name="My BAM" description="Example from the ENCODE RNA-seq CSHL track" bigDataUrl=http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqA549CellPapAlnRep1.bam There are also options to configure the display of your BAM files, such as a density plot feature that will dynamically process the underlying BAM into a wiggle signal. ------------------------------------------------------------------------ bigWig custom track A bigWig file is useful when trying to display dense, continuous data. Read more on the bigWig track format help page. Here is an example bigWig track that is colored red, instead of the default black color, that can be pasted directly into the Custom Tracks Page: track color=255,0,0 name="HeLa-S3 nucleus minus signal" description="RNA Subcellular CAGE Localization from ENCODE/RIKEN" type=bigWig visibility=full bigDataUrl=http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRikenCage/wgEncodeRikenCageHelas3NucleusPapMinusSignalRep1.bigWig There are also options to configure the display of your wiggle tracks, such as changing the track height or type of graph. Creating a session for NAR publications After creating your custom tracks and viewing your data on the Genome Browser, you can save all of your tracks and settings to a snapshot of the Genome Browser called a session. You can easily save a session by following these five steps: 1. Configure the Genome Browser to your preference Make sure the display of your custom tracks is to your liking on the Genome Browser. 2. Navigate to the sessions page Once you are satisfied with the display, go to the My Sessions page by either: - Going to My Data -> My Sessions from the navigation bar. - Using the "s then s" keyboard shortcut when viewing the main page of the Genome Browser. 3. Login to the UCSC Genome Browser You must sign in to be able to save named sessions which will then be displayed with Browser and Email links. 4. Save your session Go to the Save Settings section and in the Save current settings as named session text box, and enter a name for your session. When saving the session, be sure to have the "allow this session to be loaded by others" option checked and then click submit. - You should then be able to copy and share a link similar to the following: http://genome.ucsc.edu/s/YourUserName/YourSessionName 5. Edit the session description Once the session is created, under the "view/edit details" column you can click the view/edit button to add a description to the session. 6. Publish your session as a Public Session for public discovery If you eventually make a Public Session of your session after your paper is published, and provide a detailed description, anyone can find it by searching for terms you share. - For example, navigate to the Public Sessions page and search "NAR" to see some example sessions. Or click this link, which adds ?search=NAR to the URL. - As a Public Session, your session can then be found by anyone entering a search term, where you can even create a gallery of Public Sessions related to your search by using a unique string in your descriptions. - Here is an example "gallery" of all sessions that have the word "sessionView" in their descriptions: http://genome.ucsc.edu/cgi-bin/hgPublicSessions?search=sessionView Sharing your session link After saving your session, you will now be able to share your session link with others. There are three different ways that you can share your session: 1. Immediately after creating a new session, the top of the page offers a Browser hyperlink that will allow you to view the newly saved session. You can right-click the Browser link to save the URL. 2. Alternatively, you can right-click the session name and then click Copy Link Address to save the session URL. 3. Adding your session to the Public Sessions page will provide a useful way of sharing your session with the world in the long run. You can add your session to our Public Sessions by selecting the "post in public listing?" checkbox. Editing the session URL You can edit URLs to directly go to different parts of the Genome Browser such as by changing hgSession to hgTracks, e.g., http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=hg19.rela Having hgTracks in the session URL will take the viewer to the main page of the Genome Browser instead of the sessions page. Using track hubs to organize custom tracks Creating a track hub allows you to group and organize your annotation tracks, but is limited to compressed binary indexed formats that can be remotely hosted. This will require the use of a web-accessible location to load the track hub in the UCSC Genome Browser. If your institution does not provide web space for you, you can look into hosting your binary files at a couple different web hosting services, such as CyVerse or figshare. If you would like to learn more about using track hubs, please read our Track Hub help page. Genome Browser inquiries If you have any questions regarding the creation of your custom track or session, please feel free to contact us. Before submitting a question, we strongly encourage you to search our mailing list archives, our website, and our wiki for the answer.
/goldenPath/help/liftOver.html:Genome_Browser_Track_LiftOver LiftOver of tracks from previous to new assembly The tracks indicated by a "ball" logo (e.g., [] and []) have been lifted from a previous assembly of the same organism with a minimum of quality control scrutiny (e.g., have been lifted from hg17 or hg18 to a later human assembly). The number indicated on the logo indicates the version of the assembly that the track was lifted from. These tracks are provided to our users with the intent that they assist in interpretation of other data, but must be used with caution. Not all annotations remain intact when lifted in this manner and in any case, cannot by definition contain any sequence that is new to the newer assembly. It should also be noted that tracks containing large regions will not lift as well because of the increased chance of spanning a region that has changed between the two assemblies.
/goldenPath/help/hgWiggleTrackHelp.html:Genome_Browser_Wiggle_Tracks Configuring graph-based tracks Genome Browser graph-based tracks may be configured in a variety of ways to highlight different aspects of the displayed information. Some track types, such as BAM tracks and certain gene tracks, offer dynamic calculation of items in the current window to display a density graph where the height is proportional to the number of items mapped to each genomic position. The following information explains each of the available configuration options. Display settings [] Type of graph: The default "Bar" setting depicts the graph data using color-filled bars. To view the data as a series of points or lines, select the "Points" setting. Track height (boxed in blue): To change the default display height of the graph in pixels, type in a value from within the indicated range. Data view scaling (boxed in red): The default "Use vertical viewing range setting" option displays the data using the parameters specified in the Vertical viewing range setting. To configure the graph to automatically scale to a range defined by the minimum and maximum data points in the current view, select the "Auto-scale to data view" option. To keep the y=0 value in view at all times when Auto-scale is selected, set "Always include zero" to "ON". - When viewing signal tracks within a composite group use "group auto-scale" to enable having all tracks scaled against the one track in the group that has the highest maximum data points in the current view. For example, below is a side-by-side image of two views of the same data from a selection of cell lines within a composite of related RNA-seq experiments. On the left is the original "auto-scale to data view" setting, where each track is auto-scaled to appear at each track's highest value. And on the right is the new "group auto-scale" setting for the same RNA-seq data where all tracks are scaled against the one track in the region that has the highest value (67215 for IMR9 cell TAP + 1). [Auto-scale comparison] Click this link to interact with the original "auto-scale to data view" on the left, and to contrast the results with the new "group auto-scale" click this link. - Note that the "group auto-scale" setting is meant to be turned on and off at the composite group track set level, but you can toggle individual tracks. For instance, click the second "group auto-scale" session link that has all tracks adjusted to the highest value (67215 for IMR9 cell TAP + 1). If you right-click that individual highest track and instead of configuring the track set, select to configure only the subtrack IMR9 cell TAP + 1 from "group auto-scale" to "auto-scale to data view" and select OK and then refresh your browser window (or click one of the "refresh" buttons on the screen) you will see all the tracks adjust to the next-highest track within all tracks still tagged to display "group auto-scale" (39320 for IMR9 cell + 1). Vertical viewing range (boxed in red): The min and max values specify the vertical portion of the graph that is displayed (default range is 30-70). These numbers can be used to set data threshold indicators. For example, to display only those GC values greater than 50 percent, set the min value to "50". Transform function: Transforms the data points by the function selected in the drop-down menu. Usually the default setting is "None". Windowing function: When a view is too large to show individual data values, the values must be combined to produce a plot point. This option specifies the combining function to be used (default is "Mean"): - "Mean+whiskers" - displays the mean in a dark shade, one standard deviation around the mean in a medium shade, and the maximum/minimum in a light shade. For bar graphs only the mean, the mean plus a standard deviation, and the max are visible. This mode is not available if the Overlay method is stacked. - "Maximum" - displays the maximum value of all the points to be combined. - "Mean" - displays the mean. - "Minimum" - displays the minimum of all the points to be combined. Smoothing window: When set to a numerical value, this option determines the size, in pixels, of a smoothing window to be passed over the plot to smooth the edges of the bars or lines. This is equivalent to a trend line calculation on the graph. The default setting is "OFF". Negate values: When checked, all values in the wiggle are negated, meaning that positive values become negative and vice-versa. This is useful for wiggles representing transcription or other activities on the minus strand. Be aware that wiggles with negative values are drawn in altColor not color as positive values are. The below image shows ENCODE RNA-seq data around two genes on different strands, SIRT1 and HERC4, with a minus signal track using Negate values to flip the wiggle display to emphasize that HERC4 is expressed on the minus strand. This image also displays the signal graphed in points and a smoothing window of 16 pixels. [] Draw y indicator lines (boxed in orange): - at y= 0.0: Select the "ON" setting to display a line marking the 0.0 position on the graph (default is "OFF"). - at y= : Select the "ON" setting to display a line on the graph at the specified numerical value (defaults are "0" and "OFF"). This line can be used to mark a significant threshold on the graph. For example, in the image below, y=3. [] When you have finished making your configuration changes, click the Apply button to preview your changes, or click the Submit button to return to the annotation track display page. Annotation track display modes Each annotation track within the window may have up to five display modes: Dense The track is displayed with all features collapsed into a single line. The darker the line color the greater the wiggle value at that location. [Wiggle in dense display] Squish The track is displayed with all the features collapsed into a single line, much like the dense display mode with greater compression. [] Pack The track displays the wiggle value associated with each annotation feature creating a histogram-like image, much like the full display mode with greater compression. [] Full The track displays the wiggle value associated with each annotation feature creating a histogram-like image. [] Hide The track is not displayed at all. To hide all the annotation tracks, click the hide all button. Overlay method Note that not all graph-based tracks include the Overlay options. Transparent This setting displays the colored transparent graphs of multiple subtracks overlayed in the same vertical space. [] Solid This setting displays the colored opaque graphs of multiple subtracks overlayed in the same vertical space. [] Stacked This setting displays each graph stacked on top of each other where the high point of the graph is the sum of all the values. [] None This setting displays each graph in its own vertical space. []
/goldenPath/help/hgCnvColoring.html:Genome_Browser_Codon_Coloring Structural Variants Color Code CNV items in the Genome Browser tracks are colored by Variant Type. Three levels of color intensity are used to indicate the Clinical Significance. The lightest shade indicates benign/likely-benign variants, the darkest shade indicates pathogenic/likely pathogenic variants, and the medium shade indicates other or no clinical significance has been defined. Variant Type Benign Other Pathogenic ---------------------------- -------- ------- ------------ Gain Loss Insertion Inversion Structural Alteration Other Sequence Alterations Each color category includes various Variant Types: - Gain: duplications (SO:1000035), tandem duplications (SO:1000173), and copy number gains (SO:0001742) - Loss: deletions (SO: 0000159), and copy number losses (SO:0001743) - Insertion: insertions (SO: 0000667), indels (SO: 1000032), and mobile element insertions (SO: 0001837) - Inversion: inversions (SO: 1000036) - Structural Alteration: translocations (SO: 0000199), complex structural alterations (SO: 0001784), complex chromosomal rearrangements (SO: 0002062), fusions (SO: 0000806), and multi-allele CNVs - Other Sequence Alterations: short tandem repeat expansions (SO: 0002162), short tandem repeat contractions (SO: 0002163), microsatellites (SO: 0000289), and undetermined or undefined alterations
/goldenPath/help/hgTablesHelp.html:Table_Browser_Help Table Browser User's Guide Contents Introduction About the Table Browser databases and tables - Position-oriented tables - Non-positional tables Getting started - simple queries - Simple position-based query - Batch query using identifiers - Batch query using positions - Query to get gene symbols Filtering output by constraining field values - Filtering on fields from a single table - Filtering on fields from multiple tables - Filter constraints Intersecting data from multiple tables - Intersecting data from two tables - Intersecting data from multiple tables - Intersection options Correlating data from two tables Output formats - Displaying all fields in a table - Displaying selected fields from one or more tables - Displaying sequence (FASTA) data - Displaying CDS FASTA alignments - Saving query results in GTF or BED format - Saving data to a file - Saving data as a custom track - Displaying query results as Genome Browser hyperlinks - Displaying a statistical summary of query data Video examples of Table Browser queries - Find list of genes in a region - Obtaining coordinates and sequences of gene exons - Find SNPs in a gene - Find SNPs upstream of Genes ------------------------------------------------------------------------ Search the Genome Browser help pages: Questions and feedback are welcome. Introduction The Table Browser provides a powerful and flexible graphical interface for querying and manipulating the Genome Browser annotation tables. Because the Table Browser uses the same database as the Genome Browser, the two views are always consistent. Using the Table Browser, you can: - retrieve the DNA sequence data or annotation data underlying Genome Browser tracks for the entire genome, a specified coordinate range, or a set of accessions - apply a filter to set constraints on field values included in the output - generate a custom track and automatically add it to your session so that it can be graphically displayed in the Genome Browser - conduct both structured and free-from SQL queries on the data - combine queries on multiple tables or custom tracks through an intersection or union and generate a single set of output data - display basic statistics calculated over a selected data set - display the schema for table and list all other tables in the database connected to the table - organize the output data into several different formats for use in other applications, spreadsheets, or databases This User's Guide is aimed at both the novice Table Browser user as well the advanced user. If you are new to the Table Browser, read the Getting started section to learn about browser basics and try some simple queries. Advanced users may want to proceed directly to the section that addresses a particular area of functionality in detail. Although the Table Browser provides sufficient flexibility to satisfy the needs of most users, some advanced users may require the ability to run SQL commands directly on the Genome Browser database. UCSC provides two public MariaDB servers: (1) genome-mysql.soe.ucsc.edu (US West Coast), (2) genome-euro-mysql.soe.ucsc.edu (Europe). More information can be found on our MariaDB Access page. Alternatively, the database may be downloaded to a local computer for MariaDB access. See the mirror site documentation for information on setting up a local copy of the database. About the Table Browser databases and tables The Table Browser is built on top of the Genome Browser database, which actually consists of several separate databases, one for each genome assembly. Tables within the databases may be differentiated by whether the data are based on genome start-stop coordinates (positional tables) or are independent of position (non-positional tables).Some output formats and query options are applicable only to positional tables, hence the distinction. Positional tables Positional tables contain data associated with specific locations in the genome, such as mRNA alignments, gene predictions, cross-species alignments, and other annotations. Each of the annotation tracks displayed in the Genome Browser is based on a positional table. In some instances, data from other positional and non-positional tables may also be incorporated into the track. Data associated with custom annotation tracks active within the user's Table Browser session are also available as positional tables. Positional tables can be further subdivided into several categories based on the type of data they describe. Alignment data can be best described by using a block structure to represent each element. Other tables require only start and end coordinate data for each element. Some tables specify a translation start and end in addition to the transcription start and end. Some tables contain strand information, others don't. Most tables, but not all, specify a name for each element. Based on the format of the data described by a table, different query and output formatting options may be offered. Non-positional tables Non-positional tables contain data not tied to genomic location, for example a table that correlates a Known Gene ID with a RefSeq accession ID. Some non-positional tables relate internal numeric mRNA IDs to extended information such as author, tissue, or keyword. Some "meta" tables in this category contain information about the structure of the database itself or describe external files containing sequence data. Getting started - simple queries In its most basic form, the Table Browser can be used to retrieve a specific subset of records from a track or positional table in a selected genome assembly. The query may be based on a specific position or a set of one or more identifiers. This section describes the steps required to conduct basic simple data queries using the Table Browser. Once you have mastered the basic Table Browser functionality, refer to subsequent sections for information about generating more complex queries that use filters, intersections, and alternative data output formats. Simple position-based query Follow these steps to display a list of records that lie within a specific position in a table: Step 1. Pick a genome assembly Specify the genome assembly from which you'd like to retrieve the data by choosing the appropriate organism in the genome list, then selecting the assembly version from the assembly list. Note that the assembly list refreshes each time a different option is selected in the genome list. Assemblies are typically named after the first three characters of an organism's genus and species names. Step 2. Pick an annotation track The group list shows all the annotation track groups available in the selected genome assembly. The names correspond to the groupings displayed at the bottom of the Genome Browser annotation tracks page. When a group is selected from the list, the track list automatically updates to show all the annotation tracks available within that group. - If you already know the name of the annotation track in which you're interested, select the All Tracks option in the group list, then select the track from the track list. Similarly, you can directly select a table by choosing the All Tables option in the group list, selecting a database from the database list, then selecting the table from the table list. - To examine all the tracks available within a certain group (e.g., all gene prediction tracks), select the group name from the group list, then browse the entries in the track list. - Custom annotation tracks created during the current session are listed under the Custom Tracks group. - If no selections are made from the group or track lists, the track selection defaults to the Known Genes track in the Genes and Gene Prediction Tracks group. Step 3. Pick a table The table list shows all tables (both positional and non-positional) associated with the currently-selected track. By default, it displays the primary table for the track, i.e. the table containing the data shown in the Genome Browser annotation track. Other tables in the list are linked to the primary table by a common field and may provide supporting data used in constructing the annotation. - If the group list is set to the All Tables option, the tables list will show all tables present in the database currently selected in the database list, rather than those associated with a particular track. Step 4. Pick a genomic region (positional tables only) By default, the Table Browser region is set to genome, which will display all the data records in the selected table. - To restrict the data to a specific position range, type the position into the position box. Some examples of specific positions include a chromosome name (chrX), a coordinate range within a chromosome (chrX:100000-400000), or a scaffold name. - You can select multiple genomic regions by clicking the "define regions" button and entering up to 1,000 regions in a 3- or 4-field BED file format. - To look up the position range of a genomic element -- such as a gene name, an accession ID, an STS marker, etc. -- or keywords from the GenBank description of an mRNA, type the string into the position box, then click the Lookup button. - The data in non-positional tables are not tied to genomic coordinates; therefore, the region option is unavailable when a non-positional table is selected. A basic query on a non-positional table will show all the data in the table. Step 5. Display the output Click the Get Output button to display the results of the query. By default, the Table Browser outputs the data from all fields in the selected table as tab-separated text on the screen. See the Output formats section for information on configuring the query output. Example: Here is an example of a simple query that retrieves all the RefSeq Genes records in the position range chr7:26906938-26940301 on the May 2004 human genome assembly. 1. Select the Human option in the genome list. 2. Select the May 2004 option in the assembly list. 3. Select the Genes and Gene Prediction Tracks option in the group list. 4. Select the RefSeq Genes option in the track list. 5. Type chr7:26906938-26940301 in the position box (the Table Browser will automatically select the position option button). 6. Click the Get Output button. The Table Browser will display the records for the RefSeq accessions NM_005522, NM_153620, NM_006735, NM_153632, NM_030661, and NM_153631. Batch query using identifiers In many cases, you may want to retrieve data based on a list of one or more accessions, IDs, or names, rather than querying by genomic position. Many tracks in the Table Browser, such as those in the Genes and Gene Prediction or Variationtrack groups, support identifier queries. The identifier type used in the query must match the kind of identifiers present in the track data, e.g., mRNA accession IDs must be used to query the mRNA table and rsIDs must match those in the dbSNP table. Follow these steps to display a list of records that correspond to a set of accessions or names entered as query input. Step 1. Pick the genome assembly, track, and table Step 2. Select the genome region setting Step 3. Load the identifiers into the browser Click the Paste List button to type or paste in the identifiers or the Upload List button to load the data from a file existing on your local computer. - If you are loading multiple identifiers, entries must be separated by a space, tab, or line. - Wildcards may not be used in the list (see the Filter section for information about conducting queries that include wildcards). - The Table Browser will retain the identifier list until you delete the information by clicking the Clear List button. Step 4. Click the Get Output button See the Output formats section for information about configuring the query output. Batch query from positions If you have a list of genomic positions and want to retrieve information about their properties, you can use the Define Regions button to input multiple positions to query a chosen table. Please note, any items in the table that overlap with the defined regions will be included in the Table Browser output. In this example, you want to determine the dbSNP rsID names for your list of positions. Step 1. Select genome assembly and track To determine dbSNP rsIDs we will be using Human genome hg38 and dbSNP153. Step 2. Select the define regions button, enter regions You can find the define regions button under the Define region of interest section. Upload, type, or paste in your regions of interest, making sure they are in the desired 0/1 base notation. They will only be accepted in BED or positional format. Step 3. Select output format and get output If you want all data from a table, you need not change the output format from the default. If you want only particular columns from the table, you can change it to selected fields from primary and related tables. Once you hit the get output button, you will be redirected to a column selection page or if you did not change the output format, your output data itself. Get gene symbols in a query Follow the example below to obtain gene symbols in your query: - 1. Select the clade, genome, assembly, group, table, and region as desired. - 2. Change the output format to selected fields from primary and related tables. - 3. Click get output to go to the next step of selecting fields from related tables. - 4. Select the fields you would like from your primary table. - 5. On the same Select Fields form, find the table for the related kgXref table. For example, look for the hg38.kgXref table, and then check the checkbox next to Gene Symbol to add gene symbols to your query results. - 6. Click get output again to get the final query output. Filtering output by constraining field values The Table Browser filter option can be used to: - apply constraints on table field values to restrict which records should appear in the query output - conduct batch queries using wildcards - include fields from multiple tables in the query output Filtering on fields from a single table Follow these steps to create a filter on one or more fields in a single table: Step 1. Select the assembly, track, and region Step 2. Click the Create button on the filter line Step 3. Add the filter constraints One or more of the fields in the currently selected table may be filtered by typing constraints into the corresponding text boxes. - By default, the initial values set up in the filter match all records in the table. - Constraints must match the data type of the field to be applied successfully. For example, the geneName field in the hg17 refFlat table is a string; therefore, constraining values must also be strings. See the Filter constraints sections for more information on valid filter values. - Multiple filter values may be applied against one field by separating the values with spaces. - Individual field constraints are combined with AND, i.e. a record must meet the constraints on all fields to be retrieved. Step 4. Click the Submit button to apply the filter Once a filter has been created on a table, it will persist for the duration of the Table Browser session or until it has been cleared. Only one filter can exist for a table at a time, but multiple filters may exist in one session if they are applied on different tables. To modify an existing filter, click the Edit button on the filter line. To remove a filter, click the Clear button. Filtering on fields from multiple tables A Table Browser filter may include constraints on fields from tables related to the primary table. To create a filter composed of fields from multiple tables: Step 1. Select the assembly, track, and region Step 2. Click the Create button on the filter line Note: If a filter already exists on the table, click the Edit button to modify it or the Clear button to remove it. Step 3. Select the tables to include in the filter Scroll down to the Linked Tables section of the page. The tables listed in this section are linked to the selected table by one or more common fields (typically a name, accession, or ID field). Click the boxes in front of the table(s) whose fields you wish to include in the filter, then click the Allow Filtering Using Field in Checked Tables button. The fields of the selected tables will be displayed in the top portion of the page. Step 4. Add the filter constraints Step 5. Click the Submit button to apply the filter Note: In the current implementation of the Table Browser, the selected fields from primary and related tables output format option must be used when including fields from multiple tables in a filter. Check the boxes for all tables in the Linked Tables list on which filter constraints have been applied, then click the Allow Selection From Checked Tables button to include them in the output. Filter constraints Strings Text fields are compared to words or patterns containing wildcard characters. Valid wildcards are i "*" (matches 0 or more characters) and "?" (matches a single character). Each space-separated word or pattern in a text field box is matched against the value of that field in each record. If any word or pattern matches the value, then the record meets the constraint on that field. Numbers Numeric fields are compared to table data using an operator such as <, >, != (not equals) followed by a number. To specify a range, enter two numbers (start and end) separated by white space and/or a comma. Free-form queries When the filters on individual fields aren't sufficiently flexible, the free-form query text box allows the application of more complex constraints that typically relate two or more field names of the selected table. Valid free-form queries use the syntax of the SQL where clause (using wildcards as defined above). Free-form queries combine simple constraints with AND, OR, and NOT using parentheses as needed for clarity. A simple constraint consists of a table field name, a comparison operator (see below), and a value: a number, string, wildcard value (see below), or another field name. In place of a field name, you may use an arithmetic expression of numeric field names. - String or wildcard values for text comparisons must be quoted. Single or double quotes may be used. If comparing to a literal string value, use the "=" or "!=" operator. If comparing to a wildcard value, use the "LIKE" or "NOT LIKE" operator. - Numeric comparison operators include <, <=, =, != (not equals), >=, and >. - Arithmetic operators include +, -, *, and /. - Other SQL comparison keywords may also be used. Example: The following examples show free-form queries applied to the human refGene table). - txStart = cdsStart - searches for gene models missing expected 5' UTR upstream sequence (if strand is "+"; 3' UTR downstream if strand is "-") - chrom NOT LIKE "chr??" - restricts search to chromosomes 1 - 9, X and Y - cdsEnd - cdsStart) > 10000 - selects genes with coding spanning more than 10 kbp - txStart != cdsStart) AND (txEnd != cdsEnd) AND exonCount = 1 - finds single exon genes with both 3' and 5' flanking UTR - cdsEnd - cdsStart) > 30000) AND (exonCount=2 OR exonCount=3) - finds genes with long spans but only 2 - 3 exons Intersecting data from multiple tables It is often interesting to compare the positions of features in different annotation tracks to identify points of overlap. The Table Browser intersection utility can be used to generate various position-based comparisons of track features. Using the intersection utility, you can: - examine all genomic positions where the feature data from the two tracks overlap - identify genomic locations where there is no overlap between track features - establish thresholds for the amount of overlap that must exist between the two feature sets - conduct feature-by-feature comparisons as well as base-by-base comparisons of tracks - complement (invert) a position set before comparing the tracks An intersection may be expanded to include additional tables by using the Table Browser custom track feature. Note: The intersection utility can be used only on positional tables. To generate intersections incorporating data in non-positional tables, use the Table Browser filter utility. See the Filtering on fields from multiple tables section for more information. Intersecting data from two tables Follow these steps to configure and generate an intersection between two positional tables: Step 1. Select the assembly, track, table, and region for the primary table Note: Only positional tables may be used in an intersection. Step 2. Click the Create button on the intersection line Note: If an intersection already exists on the table, click the Edit button to modify it or the Clear button to remove it. Step 3. Select the secondary track to include in the filter Select a group in the group list, then select a track from the track list. To view all the tracks available, regardless of group, select the All Tracks option in the group list. Step 4. Select a combination method The Table Browser provides two major types of comparisons: - feature-by-feature comparisons preserve the structure of the primary table. For example, if the primary table describes exon structure and the features are compared with a second table, the results will describe exon structure (unless you choose an output format in which the structure is lost). - base-by-base comparisons examine the primary table and the table underlying the secondary track one base at a time. The structure of the primary table is not preserved in this comparison. For example, even if the primary table describes exon structure, the intersection results will contain only position ranges; no information about exon/block structure, strand, or translation region will be retained. Click the circle in front of a combination method to select it. Only one method may be selected from the two sets of methods. For more information about the individual combination options, see the Intersection Options section. Step 5. (optional) Select the complement options Check the box in front of one or both tables to complement the feature data. The complement options allow you to invert the set of positions covered by one or both tables. For example, if you choose to complement the primary track, any position covered by the that track's features will be considered not covered, and vice versa. This option provides more flexibility in comparing track positions. Step 6. Click the Submit button to apply the intersection Once an intersection has been created on a table, it will persist for the duration of the Table Browser session or until it has been cleared. Only one intersection may exist at a time. To modify an existing intersection, click the Edit button on the intersection line. To remove an intersection, click the Clear button. Intersecting data from more than two tables The Table Browser intersection utility limits combinations to only two tables. An existing intersection may be expanded to include additional tables by using the Table Browser custom track utility. To create an intersection on multiple tables: Step 1. Set up an intersection between two tables See the Intersecting data from two tables section for more information. Step 2. Save the intersection data in a custom track See the Saving data as a custom track section for information on generating a custom track. Note: In the current implementation of the Table Browser, you must use the Get Custom Track button on the custom track page to add the custom track to the Table Browser track list. Step 3. Select the newly-generated custom track Select the Custom Tracks option in the group list, then select the newly created custom track from the track list. Step 4. Create an intersection with another track Follow the steps in the Intersecting data from two tables section to intersect the custom track with another track. Intersection options Feature-by-feature comparisons Some comparisons preserve the primary table's gene and alignment structure, if it exists. For example, if the refGene table (human RefSeq Genes track) is combined with another table using one of these comparisons, the resulting output data will describe exon structure (unless you choose an output format in which the structure is lost). Primary table features are kept or discarded based on the amount of positional overlap with the features in the table underlying the secondary track. The Table Browser offers the following options in this category: - Any overlap: A primary table record will appear in the output if any of its base positions are covered by any feature in the secondary table. - No overlap: A primary table record will appear in the output only if none of its base positions are covered by any feature in the secondary table. - Overlap greater than a specified threshold: A primary table record will appear in the output if the percentage of its base positions covered by secondary table features is greater than the user-specified threshold. - Overlap less a specified threshold: A primary table record will appear in the output if the percentage of its base positions covered by secondary table features is less than the user-specified threshold. Note: If the primary table has an exon/block structure, only those bases located in exons and/or blocks will be counted. Base-by-base comparisons In these combination options, the positions of the primary and secondary table features are compared one base position at a time. When applying base-by-base comparisons, the structure of the primary table is not preserved. For example, if the refGene table (from the human RefSeq Genes track) is compared with a secondary table using these comparisons, the resulting output data will not describe exon structure. Instead, only position ranges will be returned; the exon/block structure, strand, and translation region information will be discarded. The Table Browser provides the following base-by-base combination options: - Base-by-base intersection (AND): A nucleotide position is included in the output if it is covered by at least one feature of both the primary table and the secondary table. - Base-by-base union (OR): A nucleotide position is included in the output if it is covered by at least one feature of either the primary table or the secondary table. Note: If the primary table has an exon/block structure, only base positions located in exons and/or blocks will be counted. Base-by-base complement (NOT) Before the Table Browser applies a feature-by-feature or base-by-base comparison to the table data, the set of positions covered by one or both tables can be inverted (complemented). When the data set of a table is complemented, any position covered by the table's features in the original data will be considered not covered in the inverted data, and vice versa. This option gives the user more flexibility in comparing table positions. Correlating data from two tables The Table Browser Correlation function creates a scatter plot of the data points of two tables as well as provides individual histograms of the data points from both tables. Additionally, it will also show a plot of the Residuals vs. Fitted which can be used to detect non-linearity, unequal error variances and outliers. The correlation function uses Pearson's correlation, which is optimized to work with continuous data such as wiggle tracks. For tracks that do not have data values such as gene-structured tracks, the data value used in the calculation is 1.0 for bases covered by exons and 0.0 at all other positions in the region. Due to memory and processing limitations, the number of data points that can be plotted is limited to 300,000,000. The "Window data to" function allows you to smooth out your plot by taking the average of the number of data points specified (defaults to 1). The total number of bases analyzed is independent of the data window. There is currently no way to output the results of the Correlation function. Output formats The data resulting from a Table Browser query may be configured in a number of different ways: - The output can be displayed on the screen, saved to a file, or saved to an annotation track table that can be displayed in the Genome Browser or used in a subsequent Table Browser query. - The data can include all fields from the primary or selected table, or can be restricted to selected fields from the primary table and related tables. - The data can be organized in one of several formats: tab-separated, sequence (FASTA), Browser Extensible Data format (BED), Gene Transfer Format (GTF), or a statistical summary of the data in the query. The output options available for a specific query may vary depending on the table(s) selected. For example, non-positional table data cannot be organized in a position-based format, but instead may be displayed only in tab-separated format. The Table Browser will automatically update the options on the output format list to show only those available for the current query. Displaying all fields in a table To display all the fields of the records in the query output in tab-separated format, select the all fields from primary table option. Displaying selected fields from one or more tables To restrict the query output to a subset of the fields in a table, choose the selected fields from primary and related tables option. You will be prompted to pick the table fields to display. Click the box in front of the fields you would like to see in the query output (or click the Check All button to select all the fields), then click the Get Fields button. To include data fields from other tables linked to the selected table, choose the selected fields from primary and related tables option, then scroll down to the Linked Tables section of the page. The tables listed in this section are linked to the selected table by one or more common fields (typically a name, accession, or ID field). Click the boxes in front of the table(s) whose fields you wish to include in the query output, then click the Allow Selection From Checked Tables. The fields of the selected tables will be displayed in the top portion of the page. Click the boxes in front of the fields that you wish to include in the query output, then click the Get Fields button underneath any of the field lists to generate tab-separated output that includes data from all the selected fields. Note that the Get Fields and Cancel buttons apply globally to all the selected tables, but the Check All and Clear All buttons apply only to the fields listed directly above the buttons. Displaying sequence (FASTA) data (positional tables only) To display the genomic sequence underlying the query results, select the sequence option in the output format list. The Table Browser will present you with several options to configure the output display. When you have completed the configuration, click the Get Sequence button. When displaying sequence data for gene prediction tracks, you will also be offered the option to view the protein and mRNA sequence as extracted from the data source in addition to the genomic sequence. Displaying CDS FASTA alignments (genePred tables only) The CDS FASTA alignments are created from a Multiple Alignment File (MAF) in combination with a genePred table. The UCSC MAF format stores multiple alignments at the DNA level between entire genomes. You can use the Table Browser to return FASTA alignments of coding regions in nucleotide-space or translated into amino acid-space. However, it is worth noting that the initial MAF files are all created by aligning genomes at the DNA level. Genome-wide CDS FASTA alignments Note that when using the Table Browser to fetch CDS FASTA output, it is best to restrict your query to a reasonable-sized position range rather than requesting output from the entire genome. A genome-wide query will take a substantial amount of compute time, and it is likely that your Internet browser will time out and disconnect. If you would like to download genome-wide CDS FASTA output for any of several model organisms, you can do so from the download server. Creating CDS FASTA alignments using the Table Browser To display FASTA multiple alignments for the CDS regions of genes, select the CDS FASTA alignment from multiple alignment option in the output format list. In order to see this output format option, you must have a genePred table selected. If you limit your search to a certain position range within the genome (rather than searching the entire genome), the tool will return FASTA alignments for all genes that overlap the position for which you are searching. The Table Browser will present you with a configuration page. On this page, you can select options for your output. First, select your MAF table. This is the table from which the multiple alignments will be extracted for the CDS regions of your gene track. If you do not know the name of the MAF table that corresponds to the Conservation track, you can find it in the Genome Browser by following these instructions. Then select any of the following choices: - Separate into exons - The default behavior is for the coding exons of each gene to be concatenated into one sequence in the output FASTA multiple alignment. In this case each output row header has the format listed below under "Whole gene format". If the separate into exons option is chosen then each exon will be listed with a separate header in the format listed below under "Exon format". - Show nucleotides - The default behavior is for the nucleotides in the alignment to be translated into amino acids according to the strand and exon frames defined in the selected genePred table. If this option is chosen, then the nucleotides in the alignment will not be translated into amino acids. - Output lines with just dashes - The default behavior is for the alignment rows that contain only dashes to not be printed. If this option is chosen, then these dashes-only rows are printed. - Format output as table - If this option is chosen, the header and sequence for each organism will appear on the same line. - Truncate headers as __ characters (enter zero for no headers) - This option works in conjunction with the "Format output as table" option. If you want to see only a portion of the headers, choose this option, and enter the number of characters at which you would like the headers truncated. Finally, from the list of species, select those that you would like included in the FASTA multiple alignment output. Press the "get output" button to view the output. Explanation of CDS FASTA header format Whole gene format: geneName_assemblyName peptideLength location Exon format: geneName_assemblyName_exonNum_totalExons exonLength inFrame outFrame location Here are the descriptions for each field name: - geneName- the name field from the genePred table. - assemblyName- the UCSC assembly name for the species. - peptideLength- the length of the entire coding region. If the "Show nucleotides" option is chosen, this will be in nucleotides, otherwise it will be the number of amino acids in the peptide. - location- this is the chromosome position within the assembly that is aligned in the multiple alignment. The format of this string is chrom:start-end followed by the strand where the alignment occurs. If more than one region is aligned then all the regions are listed with a semi-colon (;) between each position. This address is in genome browser coordinates (i.e. the start address is one-based). - exonNum- the ordinal of the exon. Exons are counted starting at one and begin at the transcription start site and progress along the strand of transcription. - totalExons- the number of coding exons in the gene. - exonLength- the length of the current exon. If the "Show nucleotides" option is chosen, this will be the number of nucleotides in the exon, otherwise it will be the number of amino acids in the exon (with amino acids translated from split codons placed in the exon where two of the three nucleotides lie). - inFrame- the frame number of the first nucleotide in the exon. Frame numbers can be 0, 1, or 2 depending on what position that nucleotide takes in the codon which contains it. - outFrame- the frame number of the nucleotide after the last nucleotide in this exon. Frame numbers can be 0, 1, or 2 depending on what position that nucleotide takes in the codon which contains it. Explanation of CDS FASTA sequence format As noted above, the CDS FASTA output files can be in either DNA-space or protein-space. In some instances, there is a dash ("–") in the sequence portion of the CDS FASTA file. Dashes are used in several circumstances. They indicate missing sequence for the aligning genome, as well as deletions in the aligning genome or insertions in the base genome. Because the CDS FASTA alignments are based on one reference genome, any amino acids or nucleotides that are not in the reference genome are not displayed. Consequently the peptides shown for aligning genomes are not necessarily the peptide that the gene of the other organism would generate. Any sequence inserted in an aligning genome or deleted in the base genome will not be present in the alignment. We represent this condition with an orange bar in the Genome Browser display, but the CDS FASTA alignments silently ignore this issue. Nucleotide CDS FASTA sequence: Consider the example below that shows the FASTA sequence for four species aligned with the first exon of the human gene PLEKHO1 (UCSC Gene: uc001ett.1). Note that the rat (rn4) row is missing the first three nucleotides. This could be due to a lineage-specific insertion between the rat and human genomes, or a lineage-specific deletion between the human and rat genomes. Note also that the Zebrafish (danRer4) row contains only dashes. This could be due to excessive evolutionary distance between the zebrafish and human, missing data in the zebrafish, or independent indels in the region in both species. Sometimes it is helpful to view the Conservation track in the Genome Browser in this area to clarify the exact meaning of the dashes. >uc001ett.1_hg18_1_6 30 0 0 chr1:148389072-148389101+ ATGATGAAGAAGAACAAcode >uc001ett.1_panTro2_1_6 30 0 0 chr1:129156502-129156531+ ATGATGAAGAAGAACAAcode >uc001ett.1_rn4_1_6 30 0 0 chr2:190795892-190795918- ---ATGAAGAAGAGCGGCTCCGGCAAGCGG >uc001ett.1_danRer4_1_6 30 0 0 ------------------------------ >uc001ett.1_oryLat2_1_6 30 0 0 chr11:3404940-3404969- AGGATGAAGAAAAGCAACCAGAGCAGGCGG Amino Acid CDS FASTA sequence: - Codons that have a dash in any of the three nucleotides are represented by a dash in the amino acid. - Codons with an N in any position are represented with an X. - Stop codons are represented with a Z. - All other amino acids follow the IUPAC amino acid codes. - In exon format, when the codon triplet is split between two exons, the amino acid will be displayed as part of the exon containing two of the three nucleotides like so: Saving query results in GTF or BED format (positional tables only) To format the query results using GTF or BED conventions, select the corresponding option in the output format list. Note that when you select GTF, the table browser translates the output into this format. For tables that lack feature designations, all records are arbitrarily assigned the feature "exon" to conform to GTF specifications. If you select BED format, you will be presented with the option to include and configure a custom track header and options for organizing the data. When you have finished the configuration -- or to accept the default options -- click the Get BED button at the bottom of the window. To understand the name column in the BED format, see this FAQ. Saving data to a file By default, the Table Browser displays query results directly in your internet browser window. To redirect the data to a file, type a file name into the output file box before starting the query. The Table Browser will prompt you for the location of this file on your local disk while processing the query. Saving data as a custom track (positional tables only) Query output may be saved in a format that can be displayed as a custom annotation track in the Genome Browser. Custom tracks created during a Table Browser session may also be used for subsequent queries and intersections in the same session. For more information on custom tracks, see the Genome Browser User's Guide. To save query data in custom track format, select the custom track option in the output format list. When the query is executed, the Table Browser will prompt you to customize the track header and configure the record layout of the data. The configuration is optional; the Table Browser automatically sets up a default track configuration. Click the Custom track link for more information on custom track syntax and format. When you have finished configuring the custom track -- or to accept the default configuration -- click one of the buttons at the bottom of the window to create the custom annotation track. - To display the query results as text on the screen, click the Get Custom Track File button. - To save the query results to a file on your local disk for future use, specify a file name in the output file box before executing the query, then click the Get Custom Track File button. - To load the query results into a table accessible from the Table Browser table list, click the Get Custom Track in Table Browser button. - To view the query results as a custom track in the Genome Browser, click the >Get Custom Track in Genome Browser button. Your browser display will be redirected automatically to the Genome Browser, with your custom track positioned near the top of the annotation tracks window. - To access your custom track data in a subsequent query in the same Table Browser session, select the Custom Tracks option from the group list to display the custom tracks available. Displaying query results as Genome Browser hyperlinks (positional tables only) To examine the records in the query output individually in the Genome Browser, select the hyperlinks to Genome Browser output option. The Table Browser will display a list of one or more hyperlinks corresponding to the individual records in the output data. Click a link to open up the Genome Browser display to the item and position shown on the hyperlink. Displaying a statistical summary of query data (positional tables only) To generate a statistical summary of the query output data, the region covered by the query, and the CPU time required to process the query, click the Summary/Statistics button. Video examples of Table Browser queries Finding a list of genes in a genomic region [] Visit our Video Page. Visit our YouTube channel. Obtaining coordinate sequences for a gene exon [] Visit our Video Page. Visit our YouTube channel. Finding all the SNPs in a gene [] Visit our Video Page. Visit our YouTube channel. Finding SNPs upstream of a gene [] Visit our Video Page. Visit our YouTube channel.
/goldenPath/help/mirror.html:Genome_Browser_Manual_Installation Installation of a UCSC Genome Browser on a local machine ("mirror") Contents Considerations before installing a Genome Browser Installing a Genome Browser locally with the GBiC installer Docker installation instructions Manual installation instructions Using UDR to speed up downloads The genome-mirror mailing list Considerations before installing a Genome Browser Like most web servers, running a Genome Browser installation at your institution, even for your own department, requires a Unix machine, disk space (6TB for hg19), and the resources to update the site and underlying OS regularly. You may want to consider these alternatives before embarking on a full UCSC Genome Browser installation directly on your server. For information about operating in the cloud, visit the Cloud Data and Software Resources help page. 1. Embed the Genome Browser graphic in your web page If you only want to include a genome browser view into your webpage for already existing genomes, you can use an