p Statistics: Basic Considerations by JVSchmidt

 Normal or Random? The question if any number is random or normal is associated with the inspection of the representation of this number in a special base b. Finding all digits occuring equally often the number is called normal in base b. More than this, any possible digits subsequence of length k must occur asymptotically equally often with a limit probability value of b-k. If a number is normal for all bases b, it is called normal. But randomness is more than this. We can find numbers that are normal but can be predicted because of there construction law. WAGON gives an example with the number 0.12345678910111213... IID Sequences The first direction we can go is to analyse the decimal representation of p. The target is to proof if the k-sequences have an asymptotic probability of 10-k for any k>0. This can be done by simple digit counting. A harder demand is what JADITZ calls the iid hypothesis for the digits of p: the k-sequences are independently and identically distributed uniform draws. Instinctivly I would say that the randomness of a number follows from being iid. Setting the iid as the working hypothesis there should be no correlations between the k-sequences. On this basis I have developed some tests for long sequences (k>20). Chi2 - Test On of the most common methods to proof a correspondence between empirical data and an expected distribution is the Chi2 test. This test gives an indication if the data fits the model. The test value can be calculated by Chi2 = SUM (Xi-Ei)2 / Ei The SUM is taken for all values i = 1,2,...k where k = number of possible different observations (elementary events or classes) Xi = observed number of event i Ei = (theoretical) expected number of event i. Division into classes must be choosen so that Ei >= 5 for any class. If we have very rare events they will be brought together in one single class. Depending on the value of Chi2 one can accept or reject the hypothesis that the theorectical model fits the data well. Tables for the critical values depending on a level of significance and the number of degrees of freedom f for the model can be found in any book about likelihood methods. We used a level of significance = 99% and marked results rejecting the H0-hypothesis (model fits data) with red color. In all our test we have no hidden parameters, so: f = number of classes - 1 <= Back to Main