Random numbers have been shown to be valuable in sampling, simulations, modeling, data encryption, gambling and even musical composition
The mathematician, Robert R. Coveyou, said: “The generation of random numbers is too important to be left to chance.” Random numbers are used in sampling, simulations, modeling, data encryption, gambling and even musical composition. A random number is one selected from a set of equally possible values. Any sequence of random numbers must be statistically independent of the others.
There are two major methods of random number generation, each with their own strengths and applications: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs). These can be compared by three characteristics: efficiency, determinism and periodicity. Efficiency means that many numbers can be produced quickly. Determinism means that the sequence can be reproduced, provided that the starting point is known. Periodicity means that the sequence eventually repeats itself. The methods are compared in Figure 1.
These characteristics make PRNGs more suitable for sampling, simulations, modeling and musical composition, whereas TRNGs are more suitable for data encryption and gambling.
There are statistical test suites to evaluate randomness. Three of the more common ones are Diehard, Crypt-XS and NIST. The NIST tests are built on hypothesis testing, whether a specific sequence of zeroes and ones is random or not. The battery of 15 tests evaluates frequencies, cumulative sums, runs, ranks and periodicity. After the tests have been applied, a comparison of how well the results match their theoretical distribution can be done by performing a goodness of fit of the distribution of the p-values to a uniform distribution. One evaluation method is to compare the mean and variances of the p-values to those for a uniform distribution. Another evaluation method is to compute a chi-square statistic based on the frequency counts of p-values among bins.
The procedure to generate PRNGs often uses a computational method such as the equation in Figure 2.
The first P1 is the seed (x0) and determines the sequence of numbers generated, whereas the P2, N and the subsequent P1 values determine the characteristics of the PRNGs. The “mod N” signifies that the preceding portion of the equation is divided by N and the remainder calculated to produce the first random number. The first random number becomes the P1 value for the second iteration of the equation to produce the second random number, and so on.
Other computational methods to generate PRNGs use probability functions. For example, SAS can use the standard normal distribution with a seed. The program and output are shown in Figure 3.
For sampling from multivariate distributions, functions such as randnormal, randmvt and randmultinomial can be used to generate samples from multivariate normal, multivariate Student’s t and multinomial distributions, respectively.
Random sampling from a finite data set is used to determine conformance to specifications. A program and output using Proc surveyselect in SAS is shown in Figure 4 for selection of five random samples from a set of 10.
Simulations can be used in software, such as JMP, to evaluate or model the outputs of a process as a function of randomness in the factors and noise in the model. Once a model is created of output (y) as a function of inputs (xs) using Fit Model and the Prediction Profiler and Simulator selected, factor levels can be selected. For each factor, they can either by fixed at a specific value, given a random value with a specified distribution and parameters, given a value based on an expression that allows the user to create their own distributions, or using a multivariate normal when correlated factors exist. For the response, if only the response from the model needs to be evaluated, then no noise needs to be added to the response. Other options for the response are either adding normal random or multivariate random noise. A setup to generate 500 runs using a response y with 2 factors x1 and x2 and random noise from a normal distribution for each factor and random noise in the response is shown in Figure 5.
Using the simulated table output, a normal distribution of the response can be fit using the distribution platform. Specifications can be entered and capability calculated to determine a defect rate as shown in Figure 6.
The expected process mean is 421.5 with a standard deviation of 73.8. For a lower specification of 200 and an upper specification of 600, a capability Cpk of 0.81 is calculated with a defect rate of 0.9 %. This is only an initial estimate and needs to be confirmed with additional process data.
The procedures to generate TRNGs often measure a random physical occurrence, such as radioactive decay or atmospheric noise, although the use of dice or coin flipping is still used. Lavarand used a technique of running a hash function against images from a number of lava lamps.
Random numbers have been shown to be valuable in sampling, simulations, modeling, data encryption, gambling and even musical composition. Either computational or physical methods are used depending on the application. A battery of statistical tests for randomness is recommended for evaluation.
Note: SAS version 9.3 and JMP version 11.2.0 were used to generate data in figures.
Mark Anawis is a Principal Scientist and ASQ Six Sigma Black Belt at Abbott. He may be reached at editor@ScientificComputing.com.