Transcription Machinery Domains Definitions

Transcription Machinery Domains Definitions



Specific DNA-Binding Transcription Factors

BZIP IPR004827: The basic-leucine zipper (bZIP) transcription factors [1, 2] of eukaryotic are proteins that contain a basic region mediating sequence-specific DNA-binding followed by a leucine zipper region (see [INTERPRO:IPR002158]) required for dimerization.

GATA IPR000679:A number of transcription factors (including erythroid-specific transcription factor and nitrogen regulatory proteins), specifically bind the DNA sequence (A/T)GATA(A/G) [1] in the regulatory regions of genes. They are consequently termed GATA-binding transcription factors. The interactions occur via highly-conserved zinc finger domains in which the zinc ion is coordinated by 4 cysteine residues [2, 3].

CBF/NF-Y IPR003956: The CCAAT-binding factor (CBF) is a mammalian transcription factor that binds to a CCAAT motif in the promoters of a wide variety of genes, including type I collagen and albumin. The factor is a heteromeric complex of A and B subunits, both of which are required for DNA-binding [1, 2]. The subunits can interact in the absence of DNA-binding, conserved regions in each being important in mediating this interaction.
IPR001289:The CCAAT-binding factor (CBFB/NF-YA) is a mammalian transcription factor that binds to a CCAAT motif in the promoters of a wide variety of genes, including type I collagen and albumin [1]. The factor is a heteromeric complex of A and B subunits, both of which are required for DNA-binding [2].

E2F IPR003316: The mammalian transcription factor E2F plays an important role in regulating the expression of genes that are required for passage through the cell cycle. Multiple E2F family members have been identified that bind to DNA as heterodimers, interacting with proteins known as DP - the dimerisation partners [1].

HOX IPR001356: The homeobox domain was first identified in a number of drosophila homeotic and segmentation proteins, but is now known to be well-conserved in many other animals, including vertebrates [1, 2, 3]. Hox genes encode homeodomain-containing transcriptional regulators that operate differential genetic programs along the anterior-posterior axis of animal bodies [4]. The domain binds DNA through a helix-turn-helix (HTH) structure.

HSF IPR000232: Heat shock factor (HSF) is a transcriptional activator of heat shock genes [1]: it binds specifically to heat shock promoter elements, which are palindromic sequences rich with repetitive purine and pyrimidine motifs [1]. Under normal conditions, HSF is a homo-trimeric cytoplasmic protein, but heat shock activation results in relocalisation to the nucleus [2]. Each HSF monomer contains one C-terminal and three N-terminal leucine zipper repeats [3].

HTH IPR001387: his is large family of DNA binding helix-turn helix proteins that include a bacterial plasmid copy control protein, bacterial methylases, various bacteriophage transcription control proteins and a vegetative specific protein from Dictyostelium discoideum.

MADS IPR002100: SRF function is essential for transcriptional regulation of numerous growth-factor-inducible genes, such as c-fos oncogene and muscle-specific actin genes. A core domain of around 90 amino acids is sufficient for the activities of DNA-binding, dimerisation and interaction with accessory factors.

MYB IPR001005: The retroviral oncogene v-myb, and its cellular counterpart c-myb, encode nuclear DNA-binding proteins. These belong to the SANT domain family that specifically recognize the sequence YAAC(G/T)G [1, 2]. In myb, one of the most conserved regions consisting of three tandem repeats has been shown to be involved in DNA-binding [3].

NmrA IPR008030: NmrA is a negative transcriptional regulator involved in the post-translational modification of the transcription factor AreA. NmrA is part of a system controlling nitrogen metabolite repression in fungi [1].

p53-like IPR008967: This domain is found in a number of transcription factors, including p53, NFATC, TonEBP, STAT-1, and NFkappaB, where it is responsible for DNA-binding.

PAH IPR003822: The four paired amphipathic helix motifs has been identified in the myc family of helix-loop-helix DNA-binding proteins and in the TPR family of regulatory proteins.

PC4 IPR003173: p15 has a bipartite structure composed of an amino-terminal regulatory domain and a carboxy-terminal cryptic DNA-binding domain [1]. The DNA-binding activity of the carboxy-terminal is disguised by the amino-terminal p15 domain. Activity is controlled by protein kinases that target the regulatory domain.

SART-1 IPR005011: This family of proteins appear to contain a leucine zipper [1] and may therefore be a family of transcription factors.

WRKY IPR003657: The WRKY domain is a 60 amino acid region that is defined by the conserved amino acid sequence WRKYGQK at its N-terminal end, together with a novel zinc-finger- like motif. The WRKY domain binds specifically to the DNA sequence motif (T)(T)TGAC(C/T), which is known as the W box. The invariant TGAC core of the W box is essential for function and WRKY binding [1].

Zn cluster IPR001138: The N-terminal region of a number of fungal transcriptional regulatory proteins contains a Cys-rich motif that is involved in zinc-dependent binding of DNA. The region forms a binuclear Zn cluster, in which two Zn atoms are bound by six Cys residues [1, 2]. A wide range of proteins are known to contain this domain. These include the proteins involved in arginine, proline, pyrimidine, quinate, maltose and galactose metabolism; amide and GABA catabolism; leucine biosynthesis and others.

NF-X1-type Zn finger IPR000967: This domain is presumed to be a zinc binding domain. The following pattern describes the zinc finger: C-X(1-6)-H-X-C-X3-C(H/C)-X(3-4)-(H/C)-X(1-10)-C, where X can be any amino acid, and numbers in brackets indicate the number of residues. The two position can be either his or cys. This domain is found in the human transcriptional repressor NK-X1, a repressor of HLA-DRA transcription; the Drosophila shuttle craft protein, which plays an essential role during the late stages of embryonic neurogenesis; and a yeast hypothetical protein YNL023C.

Atypical Zn finger: CRTF and GbfA are presumed to bind zinc based on the observation that their activity is zinc-dependent. However, their sequence do not contain known zinc-binding motifs and are therefore atypical Zn fingers.

Back to top

Co-Repressors and Co-Activators

Back to top

General Transcription Machinery

Chromatin Modifying Factors

SNF2-related IPR000330: This domain is found in proteins involved in a variety of processes including transcription regulation (e.g., SNF2, STH1, brahma, MOT1), DNA repair (e.g., ERCC6, RAD16, RAD5), DNA recombination (e.g., RAD54), and chromatin unwinding (e.g., ISWI) as well as a variety of other proteins with little functional information (e.g., lodestar, ETL1) [1, 2]. SNF2 functions as the ATPase component of the SNF2/SWI multisubunit complex, which utilises energy derived from ATP hydrolysis to disrupt histone-DNA interactions, resulting in the increased accessibility of DNA to transcription factors. Proteins that contain this domain appear to be distantly related to the DEAX box helicases IPR001410, however no helicase activity has ever been demonstrated for these proteins. Back to top

Chromatin Structure and Function

Bromo Domain Bromodomains are found in a variety of mammalian, invertebrate and yeast DNA-binding proteins [1]. Bromodomains can interact with acetylated lysine [2]. In some proteins, the classical bromodomain has diverged to such an extent that parts of the region are either missing or contain an insertion (e.g., mammalian protein HRX, Caenorhabditis elegans hypothetical protein ZK783.4, yeast protein YTA7). The bromodomain may occur as a single copy, or in duplicate. The precise function of the domain is unclear, but it may be involved in protein-protein interactions and may play a role in assembly or activity of multi-component complexes involved in transcriptional activation [3]. 1. Haynes,S.R., Dollard,C., Winston,F., Beck,S., Trowsdale,J., Dawid,I.B., The bromodomain: a conserved sequence found in human, Drosophila and yeast proteins. (1992) Nucleic Acids Res. 20: 2603-2603[PUBMED:1350857] [PUB00004405]
2. Jeanmougin,F., Wurtz,J.-M., Le Douarin,B., Chambon,P., Losson,R., The bromodomain revisited. (1997) Trends Biochem. Sci. 22: 151-153[PUBMED:9175470] [PUB00005462]
3. Tamkun,J.W., The role of Brahma and related proteins in transcription and development. (1995) Curr. Opin. Genet. Dev. 5: 473-477[PUBMED:7580139] [PUB00001065]

Chromo Domain IPR000953: The CHROMO (CHRromatin Organization MOdifier) domain [1, 2, 3, 4] is a conserved region of around 60 amino acids, originally identified in Drosophila modifiers of variegation. These are proteins that alter the structure of chromatin to the condensed morphology of heterochromatin, a cytologically visible condition where gene expression is repressed. In one of these proteins, Polycomb, the chromo domain has been shown to be important for chromatin targeting. Proteins that contain a chromo domain appear to fall into 3 classes. The first class includes proteins having an N-terminal chromo domain followed by a region termed the chromo shadow domain [3], eg. Drosophila and human heterochromatin protein Su(var)205 (HP1); and mammalian modifier 1 and modifier 2. The second class includes proteins with a single chromo domain, eg. Drosophila protein Polycomb (Pc); mammalian modifier 3; human Mi-2 autoantigenand and several yeast and Caenorhabditis elegans hypothetical proteins. In the third class paired tandem chromo domains are found, eg. in mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 and yeast protein CHD1. -- The chromo domain motif is found in proteins from fungi, protists, plants, fish, insects, amphibians, birds, and mammals. The chromo domain peptide fold may have its origins as a chromosomal protein in a common ancestor of archea and eukaryota, making it a particularly ancient protein structural module. Chromo domains have been found in single or multiple copies in proteins with diverse structures and activities, most or all of which are connected with chromosome structure/function. PMID: 11574148, Eissenberg J Gene. 2001 Sep 5;275(1):19-29.

SET Domain IPR001214: Proteins bearing the widely distributed SET domain (~130 amino acid) have been shown to contribute to epigenetic mechanisms of gene regulation by methylation of lysine residues in histones and other proteins. The SET domain genes are widely represented in the eukaryotic genomes, and proteins were initially distributed into four families, SU(VAR)3-9, E(Z), ASH1 and TRITHORAX based on the homology of their SET domains. Additional proteins have now been identified which do not fit into this classification [1].
The SET domain appears generally as one part of a larger multidomain protein, and recently there were described three structures of very different proteins with distinct domain compositions: Neurospora DIM-5, a member of the Su(var) family of HKMTs which methylate histone H3 on lysine 9,human SET7 (also called SET9), which methylates H3 on lysine 4 and garden pea Rubisco LSMT, an enzyme that does not modify histones, but instead methylates lysine 14 in the flexible tail of the large subunit of the enzyme Rubisco. The SET domain itself turned out to be an uncommon structure. Although in all three studies, electron density maps revealed the location of the AdoMet or AdoHcy cofactor, the SET domain bears no similarity at all to the canonical/AdoMet-dependent methyltransferase fold. Strictly conserved in the C-terminal motif of the SET domain tyrosine could be involved in abstracting a proton from the protonated amino group of the substrate lysine, promoting its nucleophilic attack on the sulphonium methyl group of the AdoMet cofactor. In contrast to the AdoMet-dependent protein methyltranferases of the classical type, which tend to bind their polypeptide substrates on top of the cofactor, it is noted from the Rubisco LSMT structure that the AdoMet seems to bind in a separate cleft, suggesting how a polypeptide substrate could be subjected to multiple rounds of methylation without having to be released from the enzyme. In contrast, SET7/9 is able to add only a single methyl group to its substrate. It has been demonstrated that association of SET domain and myotubularin-related proteins modulates growth control [2]. The SET domain-containing Drosophila protein, enhancer of zeste, has a function in segment determination and the mammalian homologue may be involved in the regulation of gene transcription and chromatin structure.
It seems likely that the varied domains that occur together with the SET domain will be involved in recognizing protein substrates and ''reading'' histone tails in order to dictate which (if any) of their multiple lysine residues should get methylated [3].

SAP IPR003034: The SAP (after SAF-A/B, Acinus and PIAS) motif is a putative DNA binding domain found in diverse nuclear proteins involved in chromosomal organization [1]. Aravind L. , Koonin E.V. SAP - a putative DNA-binding motif involved in chromosomal organization. Trends Biochem. Sci. 25: 112- 114 (2000) [PubMed: 10694879 ]

Back to top

DNA-Binding Proteins of Unknown Function

ARID Domain

DTT Domain IPR004022: This domain is predicted to be a DNA binding domain. The DDT domain is named after (DNA binding homeobox and Different Transcription factors). It is found in fetal Alzheimer antigen and several hypothetical and uncharacterised proteins.

Double-stranded DNA-binding domain IPR002836: This protein family is found in archaea and eukaryota. The human TFAR19 encodes a protein which shares significant homology to the corresponding proteins of species ranging from yeast to mice. TFAR19 exhibits a ubiquitous expression pattern and its expression is upregulated in the tumor cells undergoing apoptosis. TFAR19 may play a general role in the apoptotic process [1]. Also included in this family is a DNA-binding protein from the archaea, Methanobacterium thermoautotrophicum.

HMG1/2 IPR000910: High mobility group (HMG) proteins are a family of relatively low molecular weight non-histone components in chromatin. HMG1 (also called HMG-T in fish) and HMG2 [1] are two highly related proteins that bind single-stranded DNA preferentially and unwind double-stranded DNA. HMG1/2 are proteins of about 200 amino acid residues with a highly acidic C-terminal section which is composed of an uninterrupted stretch of from 20 to 30 aspartic and glutamic acid residues, the rest of the protein sequence is very basic. The profile in this entry describing the HMG-domains is much more general than the signature. In addition to the HMG1 and HMG2 proteins, HMG-domains occur in single or multiple copies in the following protein classes; the SOX family of transcription factors; SRY sex determining region Y protein and related proteins; LEF1 lymphoid enhancer binding factor 1; SSRP recombination signal recognition protein; MTF1 mitochondrial transcription factor 1; UBF1/2 nucleolar transcription factors; Abf2 yeast ARS-binding factor; and Saccharomyces cerevisiae transcription factors Ixr1, Rox1, Nhp6a, Nhp6b and Spp41.

Zn finger, C2H2 type Zinc finger domains [1, 2] are nucleic acid-binding protein structures first identified in the Xenopus laevis transcription factor TFIIIA. These domains have since been found in numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid residues including 2 conserved Cys and 2 conserved His residues in a C-2-C-12-H-3-H type motif. The 12 residues separating the second Cys and the first His are mainly polar and basic, implicating this region in particular in nucleic acid binding. The zinc finger motif is an unusually small, self-folding domain in which Zn is a crucial component of its tertiary structure.


TRANSCRIPTION FACTOR RESOURCES



Home| Contact dictyBase| SOPs| Site Map  Supported by NIH (NIGMS and NHGRI)