dictyBase Help: Glossary Terms

dictyBase Help: Glossary Terms


A-B | C-E | F-J | K-M | N-O | P-Q | R-S | T-Z
Accession number
This refers to the unique GenBank identifier a sequence has been assigned. This number can be used to search dictyBase for a specific sequence.
Alignment
A presentation of two compared sequences that show the regions of greatest statistical similarity.
AmiGO
AmiGO is a Browser for GO. With AmiGO the user can search for a GO term and view all gene products annotated to it, or search for a gene product and view all its associations. It is also possible to browse the ontologies to view relationships between terms as well as the number of gene products annotated to a given term.
Annotation
A statement generated from the reading of a paper abstract. An annotation reflects the results and techniques discussed in the abstract.
Anonymous FTP
A method of sharing files on the Internet. A variety of software that can provide FTP function is available in most networking software packages. Anonymous FTP simply means a computer will allow anyone using the FTP software access to a special directory fo files on its disk drive. This service is called Anonymous FTP because the user name used is "anonymous." When asked for a password, simply enter your e-mail address.
Associate
In the Colleague class of information, "Associate" refers to coworkers or collaborators.
Author
An author of a paper or personal communication included in dictyBase. The User may use the "*" wildcard character (i.e., Fisher*) to achieve the best results.
Biological process
One of the three categories used by the Gene Ontology project, biological process describes broad biological goals, such as mitosis or purine metabolism.
Bit Score
The bit score is derived from the raw alignment score in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches.
BLAST
Basic Local Alignment Search Tool is a search algorithim developed by Altschul et al. (1990). It is a very fast search algorithm that is used by the blastn, blastp, and blastx programs to separately search protein or DNA databases. BLAST is best used for sequence similarity searching, rather than for motif searching.
blastn
A BLAST program that compares a nucleotide query sequence against a nucleotide sequence database. The user must enter a NUCLEOTIDE sequence and select a DNA database (dictyBase Coding, dictyBase Genomic, GenBank) to search.
blastp
A BLAST program that compares an amino acid query sequence against a protein sequence database. The user must submit an AMINO ACID sequence and select a PROTEIN database (dictyBase Protein, SwissProt) for the search.
blastx
A BLAST program that compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. The user must enter a NUCLEOTIDE sequence and select a PROTEIN database for the search.
BLOSUM80
An alternative scoring matrix for BLAST searches.
BLOSUM45
An alternative scoring matrix for BLAST searches.
BLOSUM62
A scoring matrix that is used as the default in blastp, blastx, tblastx, and tblastn BLAST searches.
cds
In a GenBank DNA sequence entry, "cds" stands for coding sequence. A coding sequence is a subsequence of a DNA sequence that is surmised to encode a gene. A coding sequence begins with an "ATG" and ends with a stop codon. In the cases of spliced genes, all exons and introns should be within the same cds.
Cellular Component
One of the three categories used by the Gene Ontology project, cellular component encompasses subcellular structures, locations, and macromolecular complexes. Examples include nucleus, membrane, and ribosome.
Clustal W
Clustal W is an alignment program for DNA and proteins with improved sensitivity for the alignment of divergent protein sequences. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gappenalties and weight matrix choice. Nucleic Acids Res. 22:4673-80. [Clustal W]
Codon Adaptation Index (CAI)
Codon adaptation index is a measurement of the relative adaptiveness of the codon usage of a gene towards the codon usage of highly expressed genes. The relative adaptiveness (w) of each codon is the ratio of the usage of each codon, to that of the most abundant codon for the same amino acid. The CAI index is defined as the geometric mean of these relative adaptiveness values. Non-synonymous codons and termination codons (dependent on genetic code) are excluded.
Codon Bias Index (CBI)
Codon bias index is another measure of directional codon bias, it measures the extent to which a gene uses a subset of optimal codons. CBI is similar to Fop, with expected usage used as a scaling factor. In a gene with extreme codon bias, CBI will equal 1.0, in a gene with random codon usage CBI will equal 0.0. Note that it is possible for the number of optimal codons to be less than expected by random change. This results in a negative value for CBI.
Colleagues
Colleagues is a searchable list of Dictyostelium researchers with their address (Internet and Postal) and phone numbers. Colleague information may also include research interests, web pages, and links to other Colleague entries for lab members, lab heads, or collaborators. [Colleague Help]
Contig
A stretch of genomic DNA assembled from raw sequence data. The contig lengths vary and may span many genes or only part of a gene. When enough overlapping contigs become available they are assembled into whole chromosome sequences.
Curator
A keeper of the Dictyostelium Genome Database information, responsible for collecting and compiling data about Dictyostelium genetic loci and DNA sequences and providing online assistance to users of the database. The dictyBase Staff page lists all current Dictyostelium curators.
Curated Model
A gene model that has been entered by a dictyBase curator after reviewing all available evidence such as ESTs, GenBank records, or sequence similarity.
DDBJ
DNA DataBase of Japan. DDBJ is a repository of DNA sequences. DDBJ is produced in collaboration with GenBank and EMBL.
Description
A brief description of the role that the gene plays in the cell, or a general description of the gene product.
dictyBase
An online informatics recource for Dictyostelium. The database includes a variety of genomic and biological information. dictyBase is funded by the National Institute of Health. DictyBase is located in the Center of Genetic Medicine at Northwestern University. The dictyBase Homepage is located at http://dictybase.org/.
dictyBase ID
A unique identifying number within dictyBase which is specific for a single feature.
DictyDB
DictyDB, an object oriented database for storing genomic data for Dictyostelium discoideum, was developed at UCSD by Doug Smith. The software (ACEDB) was originally created by Richard Durbin and Jean Thierry-Mieg for the Caenorhabditis elegans genome project, and has been used to set up many other genome and general biological information databases. The data from GenBank and DictyDB were used in the initial population of dictyBase.
DUST
A program for filtering low complexity regions from nucleic acid sequences. DUST filtering is performed by default in blastn searches.
EC_number
The number assigned by the Enzyme Commission for the particular enzyme coded for by the gene. Next to this information are external links to the gene-specific information in the Enzyme and Kyoto information databases.
EMBL
European Molecular Biology Labs. The EMBL Nucleotide Sequence database is a comprehensive database of DNA and RNA sequences. The database is produced in collaboration with GenBank and the DNA Database of Japan (DDBJ).
Entrez
The Entrez Search System was developed by NCBI. Entrez allows you to retrieve molecular biology data and bibliographic citations from integrated nucleotide (GenBank, DDBJ, EMBL), protein (Swiss-Prot, PIR, PRF, PDB), and bioliographic (PubMed) databases. Within dictyBase database pages, external links are provided to one or more of these databases.
EST
'Expressed Sequence Tags' provided by the Japanese cDNA Project and published in GenBank. All ESTs in dictyBase have a locus page.
E-Value
In a BLAST search, an E-Value refers to the Expectation Value. The number of different alignments with scores equivalent to or better than alignment scores that are expected to occur in a database search by chance. The lower the E value, the more significant the score.
Evidence Code
Every GO annotation must indicate the type of evidence that supports it; the evidence codes correspond to broad categories of experimental or other support. The evidence code indicates how annotation to a particular term is supported.
Expect Threshold
The Expect threshold ("E") is a BLAST parameter that reflects the number of matches expected to be found by chance. If the statistical significance of a match is greater than the Expect threshold, the match will not be reported. The E threshold default is set to 10. Decreasing the E threshold will increase the stringency of the search: fewer matches will be reported. On the other hand, increasing the E threshold will decrease the stringency of the search and result in more matches being reported.
FASTA
Program used to search simultaneously both protein and DNA sequence databases (Pearson and Lipman, 1988). FASTA uses a fast search to initially identify sequences with a high degree of similarity to the query sequence and then conducts a second comparison on the selected sequences. FASTA is slower than BLAST, but is more sensitive/sometimes yields different results.
FASTA File
A FASTA file is a simple format primarily used to store genetic sequence information. FASTA files are easily created in a text editor. It consists of a header line beginning with a '>', holding a name or identifier and any additional information about the sequence. The following lines contain the DNA or protein sequence.
Feature
A feature is defined as any gene or other genetic element that resides on a chromosomal sequence. One or more features can be associated with a gene. Features include mRNAs, tRNAs, ESTs, and ORFs.
Filter Options
Filtering masks of portions of a query sequence that have low compositional complexity (such as short internal repeats or poly-A sequences) to reduce the frequency of statistically significant but biologically uninteresting BLAST results.
Frequency of Optimal Codons (Fop)
This index is the ratio of optimal codons to synonymous codons (genetic code dependent). Fop values for the original index are always between 0 (where no optimal codons are used) and 1 (where only optimal codons are used). When calculating the modified Fop index, negative values are adjusted to zero.
Genome Browser
The generic genome browser developed by GMOD (Generic Model Organism database) is employed by dictyBase to display gene maps, browse the chromosomes, align genes or gene models with ESTs or contigs, etc. [Genome browser Help]
GCG
The Genetics Computer Group is a private company involved in the development of sequence analysis software.
GenBank
GenBank is the DNA sequence database sponsored by the US National Institutes of Health. GenBank is produced in collaboration with EMBL and DDBJ.
Gene Name
See Standard Name .
Gene Page
The information contained in the gene page comprises the "heart" of dictyBase, containing information, both internally- and externally-linked, about the queried gene. All information about a given gene is contained under the standard name. If a given locus has been referred to by a synonym , these names are included in the gene page.
Gene Ontology (GO)
The Gene Ontology (GO) project was established to provide a common language to describe aspects of a gene product's biology. The use of a consistent vocabulary allows genes from different species to be compared based on their GO annotations. For each of three categories of biological information--molecular function, biological process, and cellular component--a set of terms has been selected and organized. Each set of terms uses a controlled vocabulary, and parent-child relationships between terms are defined. This combination of a controlled vocabulary with defined relationships between items is referred to as an ontology. Within an ontology, a child may be a "part of" or an example ("instance") of its parent. There are three independently organized controlled vocabularies, or gene ontologies, one for molecular function, one for biological process, and one for cellular component. Many-to-many parent-child relationships are allowed in the ontologies. A gene may be annotated to any level in an ontology, and to more than one item within an ontology. The browser for GO is AmiGO.
Gene Prediction
A gene prediction is an automatically predicted gene model. The gene predictions in dictyBase come from the Sequencing Center Consortium.
Gene Product
The name of the protein or RNA product (and its function, if relevant) that is coded for by the gene.
Gene Summary Paragraphs
This is a summary of published biological information for a gene and its product which is designed to familiarize both Dictyostelium and non-Dictyostelium researchers with the general facts and important subtleties regarding a locus. dictyBase curators compose Gene Summary Paragraphs using natural language and a controlled vocabulary based on the Gene Ontology (GO). Gene Summary Paragraphs contain links to references and GO Annotations.
GO
See Gene Ontology.
High Scoring Segment Pairs (HSPs)
In a BLAST search, an HSP is two sequence fragments (one from the query sequence and the other from a database sequence) that show a locally maximal alignment for which the alignment exceeds a pre-defined cutoff score.
Hydropathicity of Protein (GRAVY score)
This index is the general average hydropathicity or (GRAVY) score for the hypothetical translated gene product. It is calculated as the arithmetic mean of the sum of the hydropathic indices of each amino acid (Kyte and Doolittle 1982).
Keyword
A keyword is a word identified as particularly informative about an object. In a sequence, a keyword often relates to the identity of a gene or the function of the gene product. References often have a list of keywords that are Medline MeSH terms. Keywords are good to use in text searches.
Kyoto
An external link (found, if available on the locus page) to the Kyoto Encyclopedia of Genes and Genomes. The link goes directly to the information for that specific enzyme.
Literature Topics
Literature Topics are a guide to the literature for a given locus and are derived from journal abstracts. dictyBase performs a search through all PubMed literature (dating back to 1966) for all papers mentioning that locus and any aliases. dictyBase curators read the abstracts of those papers and assign the papers to one or more Topics that describe the kind of biological information contained in the abstracts. The Literature Topics are thus designed to help the user easily find the papers relevant to a given locus. Please note, however, that since only abstracts are read, the Literature Topics are not a complete description of the information contained in the papers.[Literature Topics Help]
Low Complexity Region
Regions of biased composition including homopolymeric runs, short-period repeats, and more subtle overrepresentation of some residues. The SEG program is used to mask or filter LCRs in amino acid queries. The DUST program is used to mask or filter LCRs in nucleic acid queries.
LTR
Long Terminal Repeat
Medline
Medline is the National Library of Medicine's database of biomedical papers; it contains all citation information for each paper, as well as abstracts for most of the papers.
Molecular Function
One of the three categories used by the Gene Ontology project, molecular function describes the tasks performed by individual gene products; examples are transcription factor and DNA binding.
motif
A meaningful pattern of nucleotides or amino acids that is shared by two or more molecules.
NCBI
The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM) in the National Institutes of Health (NIH). Its mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. NCBI developed and maintains the Entrez Search System and PubMed database.
ORF
An ORF (Open Reading Frame) corresponds to a stretch of DNA that could potentially be translated into a polypeptide; i.e., it begins with an ATG "start" codon and terminates with one of the 3 "stop" codons. In dictyBase, currently all feature types are ORFs. There will be different features in the future.
Orthologs
Sequences from different species that perform the same biological function and are likely to be evolved from a common ancenstral sequence. See Paralogs.
P(N)
In the results of a BLAST search, the lowest P-value given to any set of HSPs found in a database are listed in the "P(N)" column.
PAM30
Sequence alignment matrix that allows 30 accepted point mutations per 100 amino acids. A higher PAM is more suitable for comparing distantly related sequences, while a lower PAM is suitable for comparing closely related sequences (Swartz and Dayhoff, 1978).
PAM70
Sequence alignment matrix that allows 70 accepted point mutations per 100 amino acids. PAM250 is suitable for comparing distantly related sequences, while a lower PAM is suitable for comparing more closely related sequences (Swartz and Dayhoff, 1978).
PAM250-Gonnet
Sequence alignment matrix that allows 250 accepted point mutations per 100 amino acids using scoring tables recalculated since the creation of PAM250 (Gonnet et al., 1992). PAM250-Gonnet is better than PAM250 for comparing distantly related sequences.
Paralogs
Sequences that perform different biological functions in the same species that likely arose by duplication and divergence from a common ancestral sequence. See orthologs.
PDB
The Protein Data Bank (PDB) is an archive of experimentally determined three-dimensional structures of biological macromolecules, based at the Brookhaven National Laboratory.
Phenotype
In the locus page, "phenotype" refers to the observable traits of strains that carry a mutation at that locus.
PIR
PIR is a protein database. The PIR database has two sites. (PIR-US) in the United States, and PIR-JP based in Japan.
Primary Feature
A primary feature is the best available gene sequence at any given time. When a gene has not been curated, the primary feature is the gene prediction from the Sequencing Center. If a curator makes a curated model for a gene, the curated model becomes the the primary feature. Thus, a protein coding gene has several primary features: coding sequence, genomic sequence, and protein sequence.
Protein Info
This is a general category within the Locus page that contains information pertaining to the protein produced by the gene.
PubMed
PubMed is a database of bibliographic information developed by NCBI.
Query Sequence
A sequence, either amino acid or nucleotide, chosen by the user to use in a BLAST search. A query sequence can be typed or pasted into the query window on the search form. BLAST searches require a minimum query sequence length of 15 nucleotides or amino acids.
RAW Format
A format in which the nucleotide sequence appears without headers or comments. RAW format must be used when performing an D. discoideum search in BLAST or FASTA.
Reference
Within the dictyBase, a "reference" is most often a published article in a scientific journal or book; however some references are unpublished results, GenBank records, or personal communications to dictyBase. A comprehensive list of references may be obtained for a given locus within its literature topics section.
Related Sequences
A feature of Entrez that finds related nucleotide (GenBank) or protein (GenPept) sequences using similarity searches.
Research Interest
In the Colleague class of information, Research_interest refers to the broad areas of study the colleague is pursuing. Examples might be: protein translocation, DNA replication, or cytoskeleton.
Reserved Gene Name
Gene names that are soon to be published can be reserved by sending an e-mail to dictyBase.
SEG
A program for filtering low complexity regions in amino acid sequences. Residues that have been masked are represented as "X" in an alignment. SEG filtering is performed by default in blastp, blastx, tblastx, and tblastn searches.
Standard Name
Following the guidelines for the Dictyostelium Genetic Nomenclature, standard gene names are recommended to follow the Demerec Nomenclature (four letters: three lower case, one upper case, e.g. dagA, myoB). If a Demerec name is not suitable, modifications are acceptable (e.g act15). All information in the database concerning this gene will be listed within the standard name's locus window. Any other names that have been used for this gene are listed as alias within the standard name locus page.
Stock Center
The Dictyostelium Stock Center stores all available D. discoideum mutants, which can be ordered through dictyBase. The Stock Center is located in the Dept. of Anatomy & Cell Biology at Columbia University.
Synonym
An alternative to the standard name that has been agreed on by curators and researchers. The synonym field is being searched in any dictyBase search.
tblastn
A BLAST program that compares a protein query sequence against a nucleotide sequence dataset dynamically translated in all six reading frames (both strands). The user must enter a AMINO ACID sequence and select one of the NUCLEOTIDE datasets (i.e., genoSc or GenBank) for the search.
tblastx
A BLAST program that compares the six-frame translations of a nucleotide sequence to the six-frame translations of a nucleotide sequence dataset. The user must enter a NUCLEOTIDE sequence and select one of the NUCLEOTIDE datasets (i.e., genoSc or GenBank) for the search.
Topic
Biological information ascertained by dictyBase curators from abstracts for a given gene name are categorized under pre-determined "topic" tags. These topics comprise the literature topics guide to literature in dictyBase.
UniProt
UniProt (Universal Protein Resource) is the most comprehensive catalog of information on proteins. It is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.
Wildcard Character
dictyBase uses an asterisk "*" as a wildcard symbol. In a search, the wildcard character shows where any text can be tolerated. For example, searching for the locus "cdc*" will produce all cdc genes. Searching for the Author "Johns*" will produce all authors whose last name begins with those letters. Since the database requires exact matches to its format for searches to be productive, wise use of the "*" wildcard character is needed for many types of searches.
Word Size
The Word Size (W) is a BLAST parameter that determines the minimum length of a match. The query sequence is split up into every possible 'word' of a selected size. BLAST first searches for a perfect match of at least the word length. Once a match is found then it tries to extend the HSP.


Home| Contact dictyBase| SOPs| Site Map  Supported by NIH (NIGMS and NHGRI)