MATRIX
Contents

The MATRIX table contains nucleotide distribution matrices of aligned binding sequences ( Statistics   Statistics). These sequences may have been obtained by in vitro selection studies or may be compiled sites of genes. The source is appropriately indicated. The matrix entries have an identifier that indicates one of six groups of biological species (V$, vertebrates; I$, insects; P$, plants; F$, fungi; N$ nematodes; B$, bacteria), followed by an acronym for the factor the matrix refers to, and a consecutive number discriminating between different matrices for the same factor. Thus, V$OCT1_02 indicates the second matrix for vertebral Oct-1 factor. Instead of a consecutive number, the identifier of those matrices which have been generated from TRANSFAC® SITE entries, end up with an abbreviation of the least quality of the sites used to construct the matrix. For example, V$CREB_Q2 is a matrix constructed of CREB binding sites of quality 2 or better. This Q-value should not be mixed up with the high/low quality criterion of the matrix, which is given in MatchTM/MatchTMProfiler. Finally, a matrix with an identifier like V$AP1_C has been derived from a "consensus description" constructed with the aid of ConsIndex (Frech et al., Nucleic Acids Res. 21:1655-1664, 1993).

The matrix area gives the nucleotide frequencies observed in aligned binding sites of the corresponding transcription factor (or, more general, in aligned sites of the described function); an additional column depicts the IUPAC string consensus derived from the matrix according to the following rules (adapted from Cavener, Nucleic Acids Res. 15:1353-1361, 1987):

Rule 1: A single nucleotide (A,C,G,T) is shown if its frequency is at least 50% and at least twice as high as the second most frequent nucleotide.
Rule 2: A double-degenerate code indicates that the corresponding two nucleotides occur in more than 75% of the underlying sequences and rule 1 does not apply: (W = A or T), (S = C or G), (R = A or G), (Y = C or T), (K = G or T), (M = A or C).
Rule 3: Usage of triple-degenerate codes is restricted to those positions where one of the nucleotides did not show up at all in the sequence set and none of the afore mentioned rules applies: (B = C, G or T), (D = A, G or T), (H = A, C or T), (V = A, C or G).
Rule 4: All other frequency distributions are represented by the letter "N" (= A, C, G or T).

back to the top   next

Fields

It should be noted that in individual entries some fields may be empty. In this case, these fields are not displayed.
AC Accession number   "M" + 5-digit number
AS Accession numbers, secondary   when two or more entries are merged, the additional accession numbers, separated by commas, are stored in this field
ID Identifier   {species group}${factor}_{discriminating extension}
DT Created
Updated
  date of entry creation; entry author
date of last entry updating; updater
NA * Name   designation of the binding transcription factor (or in some cases of the element, e.g. TATA)
DE Factor description   short description of the factor function
BF * Binding Factors   list of linked entries of the Factor table (factor accession number; factor name; biological species)
P0 Binding Matrix   nucleotide frequency matrix with matrix head (A C G T) and derived IUPAC consensus in the last column
underneath: matrix visualized as sequence logo (adapted from Schneider and Stephens, Nucleic Acids Res. 18:6097-6100, 1990)
BA Basis   statistical basis of the matrix
BS Binding sites   list of aligned sequence segments used for matrix generation (if available) followed by a link to the respective binding site in TRANSFAC (site accession number) or TRANSCompel from which the segment was derived and a description how it was derived (start, length, gaps and orientation of the depicted sequence segment - flanking gaps included - relative to the sequence in the site entry)
CC Comments   description of the underlying sequence set, of the experimental approach applied to obtain this set etc.
RN Reference number   [consecutive entry reference number]; reference accession number.
RX PUBMED; link to PubMed entry.
RA Reference authors authors (NOTE: accents are omitted, German umlauts are transcribed as follows: ä -> ae, ö -> oe, ü -> ue; German "s-z" (ß) -> ss)
RT Reference title reference title (NOTE: Greek letters are expanded to alpha, beta, gamma etc.)
RL Reference source journal volume:pages (year)

* These fields are commonly searched

back to the top