An Online Informatics Resource for Dictyostelium
  Search dictyBase:
use * as a wildcard character
Genome Browser BLAST dictyMart Stock Center Research Tools Help Links Contact Us

dictyBase Help: Download BLAST Databases


Contents



Description

All dictyBase BLAST databases can be downloaded as FASTA files. While the the "chromosomal DNA" and "EST" datasets are updated when new versions become available, all other files are updated weekly. For the most up to date information it is advisable to download desired files frequently.


Types of Sequences

The majority of BLAST databases contain one of three different sequence types:

  • Coding Sequences (CDS): A DNA coding sequence is the region of nucleotides that corresponds to the sequence of amino acids of the predicted protein sequence. The DNA coding sequence  includes the start and stop codons, and thus begins with an "ATG" and ends with a stop codon. If the start or stop codon is missing, this indicates that only a partial coding sequence is available. Note that the DNA coding sequence does not correspond to an actual mRNA.

  • Genomic Sequences: These are the full length genes including introns, plus up to 1 kb of sequence upstream from the predicted start codon and up to 1 kb of sequence downstream from the predicted stop codon. Note that when a partial gene is the only sequence available for a 'floating gene', or a gene is located at the end of a contig, this retrieval option is limited to the available sequence.

  • Protein Sequences: This is the protein translation of the DNA "coding sequences (CDS)".


Databases

For BLAST searches and downloads, dictyBase provides several different databases holding DNA or protein sequences.

Chromosomal DNA

This database contains the full length chromosomes in dictyBase. In addition to chromosomes 1, 2, 3, 4, 5, 6, and M, this includes 'floating contigs' which are long stretches of DNA that have been sequenced but have not yet been fit into an assembly.

Primary Features

Each gene in dictyBase might be associated with one or more sequence: Curated gene models, Sequencing Center Gene Predictions, and GenBank records. The primary feature corresponds to the best available sequence for a given gene with the following priority order:
  1. Curated Model
  2. Sequencing Center Gene Predictions
  3. GenBank Record
For genes that have a curated model, and a gene prediction, only the curated model will appear in the "primary features" dataset. Genes that do not yet have curated gene models are represented by their gene prediction from the Sequencing Center. There are also a small number of genes that have not been mapped onto the genome; these are represented by their GenBank records. Note that in case of alternative splicing, all splice variants are held in the "primary features" dataset (each splice variant has a curated model).

Sequencing Center Gene Predictions

The entries in these databases were generated by the Genome Consortium at the Wellcome Trust Sanger Institute. In the case of the mitochondrial DNA, the gene models were obtained from GenBank.

Curated Models

The entries in these databases are manually verified by dictyBase curators. Gene predictions are upgraded to curated models based on evidence such as ESTs, GenBank records, or sequence similarity. In about 15% of the genes where the predictions need adjustment, the curated model differs from the gene prediction. This database contains a continuously growing subset of Dictyostelium genes, as curators add approximately 30 new curated models per week.

GenBank mRNA and Genomic Fragment Records

These databases contain sequences in dictyBase corresponding to individual GenBank mRNA records or GenBank genomic DNA records. This does not include EST records (see below) or genome records (mitochondrial DNA, all chromosome contigs).

EST Records

This database contains all EST sequences from the Japanese Sequencing Project as obtained from GenBank.


Relevant dictyBase Help Documents

Associated Glossary Terms: