How do I download the promoter sequences of a list of genes?

There are two sets of data you can get:

To obtain a defined number of nucleotides.

  1. Select the Dataset : "Genes" (should be selected by default).

  2. Click "NEXT".

  3. Here you can select the dataset more precisely. For instance you could select all the rabs by typing " rab% " in the "Gene" box (because they have the same acronym); however that's not always possible and the best thing to do is probably to go in the IDs section (4th box), and enter a list of dictybase IDs.
    ***Make sure to click the checkbox next to the list of dictybase IDs.

  4. You should also select "Primary Features" in the "Transcript subtypes" (3rd box in the page) to make sure you only get one sequence per gene in the output (see step 8). If you don't select that you will get two sequences per gene if there is a gene prediction AND a curated model.

  5. Click "NEXT".

  6. You are now on the "Select the Attribute Page", which will let you select what information to display. In the pull-down menu right below " Select the Attribute Page", choose "Sequences". The page will now show you slightly different options.

  7. You can choose many types of sequences to export. In this case, you should choose "Flank (Tran. Coding Region)", and then below click "Upstream Flank" and enter the number of bases you want to see. The default is 100.

  8. For clarity you probably want to also select "dictyBase ID" and "Gene Name" under "Header information".

  9. Click on "EXPORT", and you have a FASTA formated list of promoter sequences.

To obtain the intergenic region.

The intergenic region is the sequence between the ATG of a gene and the next gene upstream from it. If the upstream gene is encoded on the same strand, the intergenic will go from the ATG of the gene of interest to the STOP codon of the previous gene. If the two genes aer encoded on opposite strands, the intergenic will go from the ATG of the gene of interest to the START codon of the previous gene.

  1. Do steps 1 to 6 above.

  2. Under "SEQUENCES", choose "Upstream Intergenic DNA".

  3. Under "Header Information" ("Transcript Attributes"), you can select to export information about the gene whose upstream sequence you as exporting, as well as information about the upstream gene:
    • dictyBaseID, Gene Name, Chromosome Name, Start (bp), End (bp), Strand refer to the gene you searched on.
    • Upstream Gene Name returns the name of the upstream gene.
    • Upstream Intergenic Start (bp) exports the coordinate of the start of the intergenic region (this is -1 relative to the ATG).
    • Upstream Intergenic End (bp) exports the coordinate of the end of the intergenic region.

dictyMart is an implementation of the Biomart Project. BioMart is a query-oriented data management system developed jointly by the European Bioinformatics Institute (EBI) and Cold Spring Harbor Laboratory (CSHL). Visit the BioMart website for additional documentation.

