CHARGE DISTRIBUTIONAL ANALYSIS

From OpenWetWare
Jump to navigationJump to search

The distribution of charges in the protein sequence is evaluated in terms of clusters, high scoring segments, and runs and periodic patterns. Clus- ters indicate regions of typically 30 to 60 residues exhibiting a rela- tively high charge concentration. For high scoring charge segments, posi- tive scores are assigned to charge residues of the appropriate type and negative scores to all other residues. A significant cumulative positive score again indicates a region of high charge concentration. The cluster method and the scoring method will generally pick out the same segments (with the scoring method often delimiting the segment to a narrower range), conferring robustness to the results. Short segments of high charge concentration are displayed as runs (with errors). Periodic pat- terns focus on those with charges every second or third position, with possible relevance to amphipathic secondary structures; other periodic patterns are displayed in the general periodicity analysis section of the output.


        1  00+00+0++0 0000000000 0000000000 0000-00+00 000+-00000 0-000000-0 
       61  0000000000 0+0000-000 +00-0-0-0- 0000-0++00 0000000000 00000-0000 
      121  0-0000+000 +0-00+000- 0000-00-00 0000-0000+ 000+000-00 000000000- 
      181  -00+000+00 0000+0-000 000+00-000 -00-000+00 00000-000- --0-00-00+ 
      241  0+0-00-0+0 000+000++0 00000-0000 -+-00000-0 --00+0+00- 0000+0000+ 
      301  000000+000 000000-000 +000000000 0-0000000+ -000-00000 -000000000 
      361  000-0--000 00000000+0 00000000-0 0000--0000 0000000++0 0+000+-0+0 
      421  000000++00 0000-00000 00000+0000 0000000000 00000000-- 00000-0000 
      481  0000000000 00000000-0 0000000000 0000000000 000++0-00- 00000000+0 
      541  -000000000 0-+-0---00 00000+000+ -0+0+00-+- 0+00000000 000--00-+0 
      601  00000+0000 000-000000 00000000+0 +000+0-000 00000+0-00 000000-000 
      661  000000000- -0-000+000 00000-+000 0-000-0+00 000000-000 0000000000 
      721  0+0000000+ 000-0+0000 000-0+0000 0+-0000000 00+0--0000 0000-00000 
      781  000000-000 00+0000000 000000000- 000-000000 00-0+-00+0 0000000000 
      841  000+-00000 00000000+- ++000000+0 00+000-00+ 00+0000000 00000--000 
      901  0000000000 00000+00-0 -000000000 0-000000+0 -0000000-0 0000000000 
      961  0000000000 -0-000-000 0-0000+000 0000000000 -000000000 0000000000 
     1021  00-0000000 --0+000000 00+0000000 0000000000 0000000-00 0000000000 
     1081  0000000000 00+0000000 0000000000 000-000000 0-00000000 0000000000 
     1141  0000000000 0000000000 -000000000 0000000000 00000+000+ 0000000000 
     1201  -+000+0000 -+0000000- 00000-0000 00000-0-00 0000000000 00-0000000 
     1261  -000000+

A. CHARGE CLUSTERS.

Positive, negative, and mixed charge clusters are distinguished.  In  each
case, cmin indicates the minimum number of charges required for a signifi-
cant charge cluster corresponding to the given window size; e.g.,  cmin  =
9/30 or 12/45 or 15/60 means that significance requires at least 9 charges
in a segment of 30 (or fewer) residues, or 12  charges  in  a  segment  of
length  45,  or 15 charges in a segment of length 60. In the case of posi-
tive and negative charge clusters, these counts refer to net charge, i.e.,
charges  of  the  opposite  sign  within the window are counted as -1. The
sizes of the clusters are optimized for display to indicate the segment of
highest  charge  concentration,  but  a  minimum  size  of  20 residues is
required.  A mixed charge cluster that begins and ends within 15  residues
of the endpoints of a pure charge cluster is not displayed (since its sig-
nificance rests mostly on the charged residues  comprising  the  displayed
pure charge cluster), unless the -v (verbose output) flag is set, in which
case both the pure and the mixed charge  cluster  are  displayed.  On  the
other  hand,  pure charge clusters that are embedded in mixed charge clus-
ters are displayed separately (indicated by a * preceding  the  specifica-
tion of location).
     For each cluster are given its location in the sequence  (From,  to),
the  quartile  of  the  location  (1st,  2nd,  3rd,  or 4th quarter of the
sequence), length, count, and t-value (standard deviations above the mean;
to  accommodate  the  multiple  tests  performed, the t-value significance
threshold is set to 4.0 for sequences up  to  750  residues,  to  4.5  for
sequences  of  length 750-1500 residues, and to 5.0 for longer sequences);
also indicated are residues comprising at least 10% of the cluster.


Positive charge clusters (cmin = 8/30 or 11/45 or 13/60): none


Negative charge clusters (cmin = 10/30 or 13/45 or 16/60): none


Mixed charge clusters (cmin = 14/30 or 18/45 or 23/60):

1) From  552 to  582:   ERDGEEEAAAQYGSKLNGREYKVKVLDKDGK
                        -+-0---0000000+000+-0+0+00-+-0+
   quartile: 2; size: 31, +count:  7, -count:  8, 0count: 16; t-value:  4.70 *
   G:  4 (12.9%);  K:  5 (16.1%);  E:  5 (16.1%);


B. HIGH SCORING (UN)CHARGED SEGMENTS.

For each scoring scheme (scores assigned to residues as  displayed),  SAPS
displays  segments of the sequence with aggregate score exceeding the par-
ticular threshold values M_0.01 (1% significance level, segments  labeled
with  **),  M_0.05 (5% significance level, segments labeled *), or other-
wise as indicated. A minimal segment length is set as shown.  The expected
score/letter should be sufficiently large negative, and the average infor-
mation per letter should be sufficiently large positive in order  for  the
scoring statistics to apply properly (the program prints out when the con-
ditions are not met and skips evaluations).


______________________________________ High scoring positive charge segments:

score= 2.00 frequency= 0.072 ( KR ) score= 0.00 frequency= 0.000 ( BZX ) score= -1.00 frequency= 0.832 ( LAGSVTIPNFQYHMCW ) score= -2.00 frequency= 0.096 ( ED )

Expected score/letter:  -0.881;    Average information/letter:   1.973
Minimal length of displayed segments set to:  20

M_0.01= 9.33 (cv= 6.16, lambda= 1.15953, k= 0.39565, x= 3.17;

               90% confidence interval for segment length:   8 +-   6)

M_0.05= 7.92 (x= 1.76)

  1. of segments (>=20 residues) exceeding M_0.05: none


______________________________________ High scoring negative charge segments:

score= 2.00 frequency= 0.096 ( ED ) score= 0.00 frequency= 0.000 ( BZX ) score= -1.00 frequency= 0.832 ( LAGSVTIPNFQYHMCW ) score= -2.00 frequency= 0.072 ( KR )

Expected score/letter:  -0.783;    Average information/letter:   1.431
Minimal length of displayed segments set to:  20

M_0.01= 10.95 (cv= 7.33, lambda= 0.97470, k= 0.34200, x= 3.62;

               90% confidence interval for segment length:  11 +-   9)

M_0.05= 9.28 (x= 1.95)

  1. of segments (>=20 residues) exceeding M_0.05: none


___________________________________ High scoring mixed charge segments:

score= 1.00 frequency= 0.168 ( KEDR ) score= 0.00 frequency= 0.000 ( BZX ) score= -1.00 frequency= 0.832 ( LAGSVTIPNFQYHMCW )

Expected score/letter:  -0.664;    Average information/letter:   1.533
Minimal length of displayed segments set to:  20

M_0.01= 6.94 (cv= 4.47, lambda= 1.60000, k= 0.52997, x= 2.48;

               90% confidence interval for segment length:  10 +-   7)

M_0.05= 5.93 (x= 1.46)

  1. of segments (>=20 residues) exceeding M_0.05: none


________________________________ High scoring uncharged segments:

score= 1.00 frequency= 0.832 ( LAGSVTIPNFQYHMCW ) score= 0.00 frequency= 0.000 ( BZX ) score= -8.00 frequency= 0.168 ( KEDR )

Expected score/letter:  -0.512
Average information/letter:   0.065 < .10; too small !


C. CHARGE RUNS AND PATTERNS.

The table below shows the charge runs and patterns searched for (*  stands
for  +  or  -)  and  the required minimum number of matches to the pattern
allowing for at most 0 (lmin0), 1 (lmin1),  or  2  (lmin2)  mismatches  or
insertions/deletions (1% significance level). Occurrences are arranged in
the order in which they appear in the sequence. For each  run  or  pattern
are  displayed  its  length  (number  of matches) and a triplet giving the
number of mismatches, insertions and deletions. 0-runs are further charac-
terized  by  their  composition (residues comprising more than 10% of the
run).
     Run count statistics are compiled for runs of lengths at least 2/3 of
the minimal significant length (lmin0); given are the number and locations
of such runs.


pattern (+)| (-)| (*)| (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)| lmin0 4 | 5 | 6 | 54 | 9 | 10 | 13 | 11 | 13 | 16 | 5 | 5 | lmin1 6 | 6 | 8 | 65 | 11 | 12 | 15 | 14 | 15 | 19 | 6 | 7 | lmin2 7 | 8 | 9 | 73 | 13 | 14 | 17 | 16 | 17 | 21 | 7 | 8 |

(Significance level: 0.010000; Minimal displayed length:  6)

There are no charge runs or patterns exceeding the given minimal lengths.

Run count statistics:

 +  runs >=   3:   0
 -  runs >=   3:   2, at  230;  556;
 *  runs >=   4:   1, at  859;
 0  runs >=  36:   1, at 1123;