| p Statistics: Digit frequencies |
| by JVSchmidt |
| General | |
|
Doing a frequency analysis (FA) is nothing else but counting how many substrings of each possible pattern
appear in the complete digit sequence. For example: To analyze the substrings of length k=2 we read
the digit sequence in groups of digit pairs collecting every number into it's "home" box: 14 -15 - 92 - 65 - 35 - 89 - 79 -32 - 38 - ... In the end we will know how many "14", "15", "92" etc. were found. This is the base for calculating the Chi2-value for the measured sequence to judge about p being random or not. If p is RANDOM, each number "XY" should have an equal probability to appear. | |
| Result's Overview | |
| Digits analyzed: 4.2 * 10 9 Analysis started at digit: 1 Ellapsed computer time for one class: 5 min - 7min 30 sec | |
| Length of substrings L | Number of different substrings s (=10L) | Expected Frequency per class | Chi2 | z-Value for approx. standard distribution |
| 1 | 10 | 420.000.000 | 6,59 | -0,5680 |
| 2 | 100 | 21.000.000 | 116,47 | 1,2415 |
| 3 | 1.000 | 1.400.000 | 1.043,42 | 0,9938 |
| 4 | 10.000 | 105.000 | 10.124,29 | 0,8860 |
| 5 | 100.000 | 8.400 | 100.049,85 | 0,1137 |
| 6 | 1.000.000 | 700 | 1.003.038,06 | 2,1489 |
| 7 | 10.000.000 | 60 | 10.002.938,10 | 0,6572 |
| Why not testing longer sequences directly? |
|
There is a serious problem when testing the frequencies for longer and longer chains: We run out of data very fast. When testing chains with L=8 on a 4.2 billion database we will get a poor expected average of 5.25 that is near the lower limit of Chi2 usage. One can calculate this average value by use of A = N / (L x 10 L) where L is the length of proof sequences, N is the number of digits served to analyze. Even when using the data of last calculation record of Yasumasa Kanada from october, 2002, with about 1.24 x 1012 digits we can just proof sequences up to k=10. |
| More details of digit frequency analysis | |
| For single digit frequencies | |
| For double digit frequencies
<= Back to Main | |