# CHARGE DISTRIBUTIONAL ANALYSIS

### From OpenWetWare

The distribution of charges in the protein sequence is evaluated in terms of clusters, high scoring segments, and runs and periodic patterns. Clus- ters indicate regions of typically 30 to 60 residues exhibiting a rela- tively high charge concentration. For high scoring charge segments, posi- tive scores are assigned to charge residues of the appropriate type and negative scores to all other residues. A significant cumulative positive score again indicates a region of high charge concentration. The cluster method and the scoring method will generally pick out the same segments (with the scoring method often delimiting the segment to a narrower range), conferring robustness to the results. Short segments of high charge concentration are displayed as runs (with errors). Periodic pat- terns focus on those with charges every second or third position, with possible relevance to amphipathic secondary structures; other periodic patterns are displayed in the general periodicity analysis section of the output.

1 00+00+0++0 0000000000 0000000000 0000-00+00 000+-00000 0-000000-0 61 0000000000 0+0000-000 +00-0-0-0- 0000-0++00 0000000000 00000-0000 121 0-0000+000 +0-00+000- 0000-00-00 0000-0000+ 000+000-00 000000000- 181 -00+000+00 0000+0-000 000+00-000 -00-000+00 00000-000- --0-00-00+ 241 0+0-00-0+0 000+000++0 00000-0000 -+-00000-0 --00+0+00- 0000+0000+ 301 000000+000 000000-000 +000000000 0-0000000+ -000-00000 -000000000 361 000-0--000 00000000+0 00000000-0 0000--0000 0000000++0 0+000+-0+0 421 000000++00 0000-00000 00000+0000 0000000000 00000000-- 00000-0000 481 0000000000 00000000-0 0000000000 0000000000 000++0-00- 00000000+0 541 -000000000 0-+-0---00 00000+000+ -0+0+00-+- 0+00000000 000--00-+0 601 00000+0000 000-000000 00000000+0 +000+0-000 00000+0-00 000000-000 661 000000000- -0-000+000 00000-+000 0-000-0+00 000000-000 0000000000 721 0+0000000+ 000-0+0000 000-0+0000 0+-0000000 00+0--0000 0000-00000 781 000000-000 00+0000000 000000000- 000-000000 00-0+-00+0 0000000000 841 000+-00000 00000000+- ++000000+0 00+000-00+ 00+0000000 00000--000 901 0000000000 00000+00-0 -000000000 0-000000+0 -0000000-0 0000000000 961 0000000000 -0-000-000 0-0000+000 0000000000 -000000000 0000000000 1021 00-0000000 --0+000000 00+0000000 0000000000 0000000-00 0000000000 1081 0000000000 00+0000000 0000000000 000-000000 0-00000000 0000000000 1141 0000000000 0000000000 -000000000 0000000000 00000+000+ 0000000000 1201 -+000+0000 -+0000000- 00000-0000 00000-0-00 0000000000 00-0000000 1261 -000000+

A. CHARGE CLUSTERS.

Positive, negative, and mixed charge clusters are distinguished. In each case, cmin indicates the minimum number of charges required for a signifi- cant charge cluster corresponding to the given window size; e.g., cmin = 9/30 or 12/45 or 15/60 means that significance requires at least 9 charges in a segment of 30 (or fewer) residues, or 12 charges in a segment of length 45, or 15 charges in a segment of length 60. In the case of posi- tive and negative charge clusters, these counts refer to net charge, i.e., charges of the opposite sign within the window are counted as -1. The sizes of the clusters are optimized for display to indicate the segment of highest charge concentration, but a minimum size of 20 residues is required. A mixed charge cluster that begins and ends within 15 residues of the endpoints of a pure charge cluster is not displayed (since its sig- nificance rests mostly on the charged residues comprising the displayed pure charge cluster), unless the -v (verbose output) flag is set, in which case both the pure and the mixed charge cluster are displayed. On the other hand, pure charge clusters that are embedded in mixed charge clus- ters are displayed separately (indicated by a * preceding the specifica- tion of location). For each cluster are given its location in the sequence (From, to), the quartile of the location (1st, 2nd, 3rd, or 4th quarter of the sequence), length, count, and t-value (standard deviations above the mean; to accommodate the multiple tests performed, the t-value significance threshold is set to 4.0 for sequences up to 750 residues, to 4.5 for sequences of length 750-1500 residues, and to 5.0 for longer sequences); also indicated are residues comprising at least 10% of the cluster.

Positive charge clusters (cmin = 8/30 or 11/45 or 13/60): none

Negative charge clusters (cmin = 10/30 or 13/45 or 16/60): none

Mixed charge clusters (cmin = 14/30 or 18/45 or 23/60):

1) From 552 to 582: ERDGEEEAAAQYGSKLNGREYKVKVLDKDGK -+-0---0000000+000+-0+0+00-+-0+ quartile: 2; size: 31, +count: 7, -count: 8, 0count: 16; t-value: 4.70 * G: 4 (12.9%); K: 5 (16.1%); E: 5 (16.1%);

B. HIGH SCORING (UN)CHARGED SEGMENTS.

For each scoring scheme (scores assigned to residues as displayed), SAPS displays segments of the sequence with aggregate score exceeding the par- ticular threshold values M_0.01 (1% significance level, segments labeled with **), M_0.05 (5% significance level, segments labeled *), or other- wise as indicated. A minimal segment length is set as shown. The expected score/letter should be sufficiently large negative, and the average infor- mation per letter should be sufficiently large positive in order for the scoring statistics to apply properly (the program prints out when the con- ditions are not met and skips evaluations).

______________________________________ High scoring positive charge segments:

score= 2.00 frequency= 0.072 ( KR ) score= 0.00 frequency= 0.000 ( BZX ) score= -1.00 frequency= 0.832 ( LAGSVTIPNFQYHMCW ) score= -2.00 frequency= 0.096 ( ED )

Expected score/letter: -0.881; Average information/letter: 1.973 Minimal length of displayed segments set to: 20

M_0.01= 9.33 (cv= 6.16, lambda= 1.15953, k= 0.39565, x= 3.17;

90% confidence interval for segment length: 8 +- 6)

M_0.05= 7.92 (x= 1.76)

- of segments (>=20 residues) exceeding M_0.05: none

______________________________________
High scoring negative charge segments:

score= 2.00 frequency= 0.096 ( ED ) score= 0.00 frequency= 0.000 ( BZX ) score= -1.00 frequency= 0.832 ( LAGSVTIPNFQYHMCW ) score= -2.00 frequency= 0.072 ( KR )

Expected score/letter: -0.783; Average information/letter: 1.431 Minimal length of displayed segments set to: 20

M_0.01= 10.95 (cv= 7.33, lambda= 0.97470, k= 0.34200, x= 3.62;

90% confidence interval for segment length: 11 +- 9)

M_0.05= 9.28 (x= 1.95)

- of segments (>=20 residues) exceeding M_0.05: none

___________________________________
High scoring mixed charge segments:

score= 1.00 frequency= 0.168 ( KEDR ) score= 0.00 frequency= 0.000 ( BZX ) score= -1.00 frequency= 0.832 ( LAGSVTIPNFQYHMCW )

Expected score/letter: -0.664; Average information/letter: 1.533 Minimal length of displayed segments set to: 20

M_0.01= 6.94 (cv= 4.47, lambda= 1.60000, k= 0.52997, x= 2.48;

90% confidence interval for segment length: 10 +- 7)

M_0.05= 5.93 (x= 1.46)

- of segments (>=20 residues) exceeding M_0.05: none

________________________________
High scoring uncharged segments:

score= 1.00 frequency= 0.832 ( LAGSVTIPNFQYHMCW ) score= 0.00 frequency= 0.000 ( BZX ) score= -8.00 frequency= 0.168 ( KEDR )

Expected score/letter: -0.512 Average information/letter: 0.065 < .10; too small !

C. CHARGE RUNS AND PATTERNS.

The table below shows the charge runs and patterns searched for (* stands for + or -) and the required minimum number of matches to the pattern allowing for at most 0 (lmin0), 1 (lmin1), or 2 (lmin2) mismatches or insertions/deletions (1% significance level). Occurrences are arranged in the order in which they appear in the sequence. For each run or pattern are displayed its length (number of matches) and a triplet giving the number of mismatches, insertions and deletions. 0-runs are further charac- terized by their composition (residues comprising more than 10% of the run). Run count statistics are compiled for runs of lengths at least 2/3 of the minimal significant length (lmin0); given are the number and locations of such runs.

pattern (+)| (-)| (*)| (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)| (H.)|(H..)|
lmin0 4 | 5 | 6 | 54 | 9 | 10 | 13 | 11 | 13 | 16 | 5 | 5 |
lmin1 6 | 6 | 8 | 65 | 11 | 12 | 15 | 14 | 15 | 19 | 6 | 7 |
lmin2 7 | 8 | 9 | 73 | 13 | 14 | 17 | 16 | 17 | 21 | 7 | 8 |

(Significance level: 0.010000; Minimal displayed length: 6)

There are no charge runs or patterns exceeding the given minimal lengths.

Run count statistics:

+ runs >= 3: 0 - runs >= 3: 2, at 230; 556; * runs >= 4: 1, at 859; 0 runs >= 36: 1, at 1123;