Pseudo Random and Random Numbers Vivek Bhatnagar and Chaitanya Cheruvu

Pseudo Random and Random
Numbers
Vivek Bhatnagar and
Chaitanya Cheruvu
Contents
1.
2.
3.
4.
5.
6.
7.
Introduction to Pseudorandom Numbers
Theory Behind Pseudorandom Numbers
Some Pseudorandom Number Generators
Attacks on Pseudorandom generators
Tests for pseudorandom functions
True Random generators
Conclusions
Introduction
 Truly random - is defined as exhibiting ``true''
randomness, such as the time between ``tics'' from a
Geiger counter exposed to a radioactive element
 Pseudorandom - is defined as having the appearance of
randomness, but nevertheless exhibiting a specific,
repeatable pattern.
 numbers calculated by a computer through a deterministic
process, cannot, by definition, be random
Introduction
• Given knowledge of the algorithm used to create the
numbers and its internal state (i.e. seed), you can predict all
the numbers returned by subsequent calls to the algorithm,
whereas with genuinely random numbers, knowledge of
one number or an arbitrarily long sequence of numbers is
of no use whatsoever in predicting the next number to be
generated.
• Computer-generated "random" numbers are more properly
referred to as pseudorandom numbers, and pseudorandom
sequences of such numbers.
Introduction
• Usage
– Almost all network security protocols rely on
the randomness of certain parameters
• Nonce - used to avoid replay
• session key
• Unique parameters in digital signatures
– Monte Carlo Simulations • is a mathematical technique for numerically solving
differential equations. Randomly generates scenarios
for collecting statistics.
Introduction
• (Desirable) Properties of Pseudorandom Numbers
– Uncorrelated Sequences - The sequences of random
numbers should be serially uncorrelated
– Long Period - The generator should be of long period
(ideally, the generator should not repeat; practically, the
repetition should occur only after the generation of a
very large set of random numbers).
– Uniformity - The sequence of random numbers should
be uniform, and unbiased. That is, equal fractions of
random numbers should fall into equal ``areas'' in
space. Eg. if random numbers on [0,1) are to be
generated, it would be poor practice were more than
half to fall into [0, 0.1), presuming the sample size is
sufficiently large.
– Efficiency - The generator should be efficient. Low
overhead for massively parallel computations.
The Random Number Cycle
•
•
•
Almost all random number
generators have as their basis a
sequence of pseudorandom integers
The integers or ``fixed point''
numbers are manipulated
arithmetically to yield floating
point or ``real'' numbers.
The Nature of the cycle
–
the sequence has a finite number
of integers
– the sequence gets traversed in a
particular order
– the sequence repeats if the period
of the generator is exceeded
– the integers need not be distinct;
that is, they may repeat.
Introduction
• Testing Pseudorandom generators
– clever algorithms have been developed which generate
sequences of numbers which pass every statistical test
used to distinguish random sequences from those
containing some pattern or internal order.
– Tests to check the different properties discusses above.
– Tests include mean and variance checks. Mean should
be close to 0.5 and variance 1/12 = 0.08 for uniformly
distributed pseudorandom numbers.
Theory of Pseudorandom Numbers
Theory
• Computational Indistinguishability - Consider an ensemble (i.e. a
sequence of numbers). Now if we can generate an ensemble which
cannot be differentiated from the first ensemble in polynomial time by
any efficient procedure , the second ensemble is, for all practical
purposes, equivalent to the first ensemble.
• Ensembles that are computationally Indistinguishable from a uniform
ensemble are called pseudorandom.
• General definition of Pseudorandom numbers
– A deterministic polynomial time algorithm which satisfies the following 2
conditions
Theory
• Computational Indistinguishability - Consider an ensemble
(i.e. a sequence of numbers). Now if we can generate an
ensemble which cannot be differentiated from the first
ensemble in polynomial time by any efficient procedure ,
the second ensemble is, for all practical purposes,
equivalent to the first ensemble.
• Ensembles that are computationally Indistinguishable from
a uniform ensemble are called pseudorandom.
• General definition of Pseudorandom numbers
– A deterministic polynomial time algorithm which satisfies the
following 2 conditions
• Expansion: for every s E {0,1}* , |G(s)| > |s|
• Pseudo randomness: the ensemble { G(s) } is pseudorandom
Theory
• Significance of Pseudorandom Generators
– Efficient amplifiers/expanders of randomness.
– Using very little randomness (a randomly chosen seed) they
produce very long sequences which look random to any efficient
observer.
– Pseudorandom generators allow to produce high quality random
sequences at low costs making them very useful in cryptography.
– They produce unpredictable sequences i.e. no efficient algorithm
can guess its next bit given a prefix of the sequence.
Theory
• One-way functions and Pseudorandom
numbers
– The existence of one is necessary and sufficient
condition for the existence of the other.
– Some one-way functions
• RSA Function
• Discrete Logarithm
Theory
• Some practical constructions based on collections of
permutations
– The intractability of the Discrete Logarithm Problem : based on the
fact that it is hard to predict, given a prime P, a primitive element
G, and an element Y of the group, whether there exists 0 < x < P/2
such that Y = G^x mod P.
– The difficulty of inverting RSA : based on the fact that the least
significant bit constitutes a hard-core for the RSA collection.
– The intractability of Factoring Blum Integers: based on the fact
that the least significant bit constitutes a hard-core for the Rabin
collection
Linear Congruential Generators
• We begin by discussing the linear congruential generator - the one most
commonly used for generating random integers
• we generate the next random integer using the previous random integer
, the integer constants, and the integer modulus
• To get started, the algorithm requires an initial ``seed'', which must be
provided by some means.
• We refer to the sequence generated as
• The appearance of randomness is provided by performing modulo
arithmetic or remaindering
• Note that the next result, , depends upon only the previous integer This is a characteristic of linear, congruential generators which
minimizes storage requirements, but at the same time, imposes
restrictions on the period.
Linear Congruential Generators
• With Xn determined, we generate a corresponding real
number as follows:
• When dividing by m Rn , the values are then distributed
on [0,1).
• We desire uniformity, where any particular Rn is just as
likely to appear as any other Rn , and the average of the
Rn is very close to 0.5.
Linear Congruential Generators
• Example 1 LCG (5, 1, 16, 1)
– Let us consider a simple example with a= 5, c=1, m=16, and
X0 =1. The sequence of pseudorandom integers generated by this
algorithm is:
1,6,15,12,13,2,11,8,9,14,7,4,5,10,3,0,1,6,15,12,13,2,11,8,9,14, ..
Linear Congruential Generators
• We observe :
– The period (the number of integers before the sequence repeats) P
is 16 - exactly equal to the modulus, m. Thus, for m=16 , this
sequence is of long period (the longest possible), and uniform (it
completely fills the space of integers from 0-15).
– sequence exhibits throughout its period the pattern of alternating
odd and even integers.
– It is readily apparent that the sequence is serially correlated. Due to
this lack of randomness, the values should not be used as random
digits. The real numbers generated from the integer sequence are
generally sufficiently random in the higher order (most significant)
bits to be used in many application codes.
Linear Congruential Generators
Linear Congruential Generators
• Next, we infer the following. Because each random integer results
from the previous integer alone, selecting any initial seed from 0 to 15
would just cyclically shift the above sequence.
• Thus, all that a different choice of the initial seed does is shift the
starting point in the sequence already determined by a, c and m
• Finally, we note that the average of the real numbers is 0.4688 and the
variance is 0.0830. The departure of these values from the ideal ones
of 1/2 and 1/12 is due to the short period of this sequence and the
rather coarse resolution of the generated real numbers. These
conditions of average and variance approaching the theoretical values
are necessary but not sufficient conditions for a good random number
generator.
Linear Congruential Generators
• Example 2 LCG (5, 0, 16, 1)
– Next, we take the case of c =
0 . This is termed a
multiplicative congruential
random number generator:
Linear Congruential Generators
Linear Congruential Generators
• Observations
– the low order bits are not random.
– the sequence is correlated, as all successive integers differ by 4
from their predecessors.
– At coarse granularity, the sequence is uniform. For example, if we
divide [0,1) equally into quarter segments, then exactly one
random number falls into each segment: [0, 0.25), [0.25, 0.5), [0.5,
0.75) and [0.75,1). However, at finer granularity, this uniformity
breaks down - consider dividing up the domain into 8 equal
segments, for example.
– There are two separate issues to consider here.
• the finite precision existing in all computers, which results in a roundoff error to the precision with which integers can be represented, or
with which the floating point divide is accomplished.
• the interaction of the sequence of random numbers produced by our
generator with our application. This is particularly troublesome when
an application requires n-tuples of random numbers, instead of just
one random number at a time
Linear Congruential Generators
• Initial Seed
– When debugging, it is important to implement the
algorithm to reproduce the same stream of random
numbers on successive runs.
– the initial seed should be set to a ``random'' odd
value Eg.
Linear Congruential Generators
• Characteristics of good LCGs Pseudo random number
generators
– A large value of a is desirable to provide sufficient randomness.
– A large value of m is also desired, so that the period is kept long.
• Summary of the salient features and the recommendations
– Multiplicative, congruential generators are adequate to good for
many applications. They are not acceptable... for high-dimensional
work..
– They can be very good if speed is a major consideration. Prime
modulo are best. However, modulo of the form are faster on
binary computers.
R250
• Uses a shift register sequence.
• Has several advantages over a linear congruential generator
– Long period 2^249
– Period does not depend upon the number of bits used in the random
number generator
– Generally much faster than an LCM implementation
– Generator is built from a one bit random generator based on the
following equation.
– The max period is 2^(p-1). We will use the value of p =250 .
R250
• Choosing most of the ci terms to be 0 we get the equation.
• If we choose q = 103 then the number generated is got by adding the
previously calculated 103rd bit and 250th bit
• To generate a random number of 16 or 32 bit s. This can be done by doing
the above 1 bit addition for each bit in the desired random number.
• Since exclusive-or is the same as bitwise addition all the bit operations
can be don in parallel. This gives the speed advantage.
Shuffling Numbers
• Sometimes it is desirable to randomize a small set of
numbers so that a non-repeating sequence is obtained.
– Games
– Oceanographic RAFOS float
• It is Important not to repeat numbers. Taking the modulus
of a generator like r250 will not work as the numbers could
repeat.
• One way to do this would be to put the value to be shuffled
into an array and to use a random number generator to
generate indices into the array to actually shuffle the
numbers. The array is then accessed sequentially.
Quasi Random Numbers
• For some applications pseudo random numbers are
a little too random.
• Some portions of the domain are relatively under
sampled and other portions are over sampled.
• Quasi Random number generators maintain a
uniform density of coverage over the entire
domain by giving up serial independence of
subsequenctly generated value in order to obtain a
uniform coverage of the domain.
Cryptanalytic Attacks on Random Number
Generators
• Examples of random parameters in cryptography:
–
–
–
–
Session keys
Numbers to be hashed with passwords
Parameters in digital signatures
Nonces
• Most of the above are approximated using PRNGs
• For true randomness:
– Noise in electrical circuits
– Radioactive decay etc.
Classes of Attacks on PRNGs:
• Direct Cryptanalytic Attack:
– When the attacker can directly distinguish between PRNG numbers and
random numbers (cryptanalyze the PRNG).
• Input Based Attack:
– When the attacker is able to use knowledge and control of PRNG inputs to
cryptanalyze the PRNG.
• State Compromise Extension Attacks:
– When the attacker can guess some information due to an earlier breach of
security. The advantage of a previous attack is extended.
Direct Cryptanalytic Attacks:
• When the attacker can directly cryptanalyze the PRNG.
• Applicable to most PRNGs
• Not applicable when the attacker is not able to directly see the output
of the PRNG.
– Eg A PRNG used to generate triple-DES keys. Here the output of the
PRNG is never directly seen by an attacker.
Input Based Attacks:
• When an attacker used knowledge or control of the inputs to
cyptanalyze the PRNG output.
• Types:
– Known Input
• If the inputs to the PRNG, that are designed to be difficult for a user to guess,
turn out to be easily deducible. Eg disk latency time. When the user is
accessing a network disk, the attacker can observe the latency time.
– Chosen input
• Practical against smartcards, applications that feed incoming messages
(username/password etc) to the PRNG as entropy samples.
– Replayed Input
• Similar to chosen input, except it requires less sophistication on the part of the
attacker.
State Compromise Extension Attacks:
• Attempts to extend the advantages of a temporary security breach
• These breaches can be:
– Inadvertent leak
– Previous cryptographic success
• This attack is successful when:
– The attacker learns the internal state of the system at state S and it’s:
– Able to recover unknown PRNG outputs from before S was compromised.
OR
– Recover outputs from after a PRNG has collected a sequence of inputs
that an attacker cannot otherwise guess.
• These attacks usually succeed when the system is started in guessable
state (due to lack of entropy):
State Compromise Extension Attacks (cont):
• These attacks are classified as:
– Backtracking attacks
• Uses the compromise of PRNG state S to learn about all previous PRNG
outputs.
– Permanent compromise attack
• Once S has been compromised, all future and past outputs of the PRNG are
vulnerable.
– Iterative guessing attacks
• Uses the knowledge of state S that was compromised at time t and the
intervening PRNG outputs to guess the state S’ at time t+Δ.
– Meet-in-the-middle attacks
• Combination of iterative guessing and backtracking.
Some Examples:
• X 9.17 PRNG:
– Vulnerable to Input based attack and state compromise extension attacks.
• DSA PRNG:
– Vulnerable only to state compromise extension attacks.
• RSAREF PRNG:
– Vulnerable to Input based attack and state compromise extension attacks.
Tests for Randomness in Random Numbers:
• Quantitative tests:
– Χ2 tests:
– Lagged Correlation:
• Qualitative tests:
– Scatter Plots
• Plot pairs of random numbers.
• Clumps of numbers, gaps and patterns are easily visible.
– Random Walk
Χ2 tests:
• Measure how well the presumed distribution (usually uniform) is
represented.
• Algorithm for the test:
– Divide the whole interval, within which the random number would be into
finite number of bins (class intervals). Assume they have same size.
– Count the number of random numbers within each interval and calculate
the “expected” number of observations [(number of random numbers
used) / (number of class intervals) for uniform intervals].
– Calculate: Χ2 = Σ(i=1,m)(observedi – expectedi)2 / (expectedi)
– The value of Χ2 determines if the numbers generated represent a chosen
distribution, by looking up in a table, some critical values of Χ2.
Lagged Correlation:
• This test reveals the relationship between the numbers at one time and
at another (autocorrelation).
• Reveals trends and periodicity of numbers.
• Properties of an ideal random number generator:
– Autocorrelation value = 1; for lag (τ)=0
– Autocorrelation value = 0; for any other value of τ
• If the autocorrelation values slowly drop to 0 as τ increases, then the random
numbers generated are not very independent of each other.
Scatter Plots:
Random Walk
• Algorithm:
– Divide the range of the random number generator into equal intervals.
– (Divide into 4 intervals for a random walk in two dimensions)
– Generate a number, if the number falls in:
•
•
•
•
–
–
–
–
–
First interval, increment X
Second interval, increment Y
Third interval, decrement X
Fourth interval, decrement Y
Generate t steps for a random walk for n walks
Calculate the means squared distance reached
Plot this distance against time
A plot for several values of t and distance should roughly be linear.
Else the random numbers are not correctly distributed.
Truly Random Numbers:
• Must rely on external physical quantities
– Computers require special hardware
– Few computers have access to this kind of hardware
– Example: Sensors (heat/pressure) etc.
• Randomness without relying on external data:
– Some way to measure internal activity of the computer such that the
activity is quantifiable and genuinely random.
– Example: Timing of keystrokes as a user enters a password.
Some physical quantities used in real world for true
random number generation:
• Timing of keystrokes when a user enters a password.
• Measurement of air turbulence due to the movement of hard drive
heads.
• Timings of memory accesses under artificially induced thrashing
conditions.
• Precise measurement of current leakage from a CPU or any other
system component.
• Measurement of timing skew between two systems timers:
– A hardware timer
– A software timer
Conclusions:
• Random number are the basis for many cryptographic applications.
• There is no reliable “independent” function to generate random
numbers.
• Present day computers can only approximate random numbers, using
pseudo-random numbers generated by Pseudo Random Number
Generators (PRNG)s.
• Attacks on many cryptographic applications are possible by attacks on
PRNGs.
• Computer applications are increasingly turning towards using physical
data (external/internal) for getting truly random numbers.