Numbers Representation & Error Analysis & Accuracy & Precision R. Ouyed Phys381 We count in Base 10 (Decimal) Number Representation R. Ouyed Computer’s perception … 101 100 99 98 97 96 95 14 13 12 11 10 24 23 22 21 20 19 18 17 16 15 9 8 7 6 5 4 3 2 1 0 Phys381 Computers are made of a series of switches Each switch has two states: ON or OFF Each state can be represented by a number: Ran out of symbols (0-9), so increment the digit on the left by one unit. 1 for “ON” and 0 for “OFF” R. Ouyed Phys381 R. Ouyed Phys381 1 Computers count in Base 2 (Binary) Binary Numbers (Bits) Bits can be represented as: 1 or 0 On or Off Up or Down Open or Closed Yes or No Black or White Thick or Thin Long or Short R. Ouyed Counting in Binary is the same, but with only two symbols Phys381 R. Ouyed On (1) Off (0) 10000 1111 1110 1101 1100 1011 1010 1001 1000 111 101 100 110 11 10 1 0 Phys381 7 Equivalence of Binary and Decimal Every Binary number has a corresponding Decimal value (and vice versa) 8 Converting Binary to Decimal In general, the "position values" in a binary number are the powers of two. Examples: Binary Number 1 10 11 … 1010111 R. Ouyed Decimal Equivalent 1 2 3 … 87 Phys381 R. Ouyed The first position value is 20 , i.e. one The 2nd position value is 21 , i.e. two The 2nd position value is 22 , i.e. four The 2nd position value is 23 , i.e. eight The 2nd position value is 24 , i.e. sixteen etc. Phys381 2 Converting Binary to Decimal Converting Binary to Decimal 1 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 128 128 64 32 16 8 128 + 0 + 32 + 0 + 8 4 2 64 16 8 4 2 1 1 + 4 + 0 + 0 0 + 64 + 0 + 16 + 0 128 + 32 + 8 + 4 = 172 R. Ouyed 32 + 0 + 0 + 1 64 + 16 + 1 = 81 Phys381 R. Ouyed Phys381 12 Representing Fractions in Binary Converting a decimal# to a binary# Consider ordinary Base-10 notation: what does e.g. 1205.91410 mean? 1 2 0 5 . 9 1 4 103 102 101 100 . 10-1 10-2 10-3 • Binary works the same way; e.g., what is 101.012? 1 22 0 21 1 20 . . 0 2-1 1 2-2 = 4 + 1 + 1/4 = 5.025 R. Ouyed Phys381 R. Ouyed Phys381 3 13 Converting a decimal# to a binary# Floating Point Number R. Ouyed Phys381 Floating-Point Numbers History: origins of computers in finance (banking) and military (physics). Financial calculations are (usually) fixedpoint: the decimal place is fixed in one position, two digits from right: $129.95, $0.59, etc. In physical calculations, we need the decimal place to “float”: 3.14159, 0.333, 6.0221415 × 1023 , etc. R. Ouyed Phys381 R. Ouyed Phys381 The Radix In the decimal system, a decimal point (radix point) separates the whole numbers from the fractional part Examples: 37.25 ( whole=37, fraction = 25) 123.567 10.12345678 R. Ouyed Phys381 4 Exponential Notation Normalized Notation Format: Mantissa X 10Exponent, or a X 10n Abbreviated a e n or a E n; for example, 6.0221415e23 Mantissa (a.k.a. significand, a.k.a. fractional part) is a floating-point number; exponent is an integer. So someone has to decide the precision: IEEE 754 standard: with 64-bit numbers, use 52 bits for mantissa, 11 for exponent, 1 for sign (+ or − ) R. Ouyed Phys381 So a normalized number in exponential notation has the decimal place immediately before the first nonzero digit: .314159e1, R. Ouyed -0.9876 x 10-3 Location of decimal point Exponent Base Integer part mantissa R. Ouyed ! a.be Binary normalization What is 12.5 in floating-point representation? 1. Convert 12.5 to binary fixed point 2. Normalize the number by moving the radix point, producing the mantissa 1100.12 = 0.11001 * 24 Integer part mantissa exponen Baset of the number Phys381system used Phys381 12.510 = 1100.12 Sign of exponent Mantissa etc. Significant digits of a normalized floating-point number are all digits except leading zeros. Floating Point Number (Computer) Sign of mantissa .60221415e24, R. Ouyed ! a.be exponen Baset of the number system used Phys381 5 Normalization … cont. … Every binary number, except the one corresponding to the number zero, can be normalized by choosing the exponent so that the radix point falls to the right of the leftmost 1 bit. 37.2510 = 100101.012 = 1.0010101 x 25 7.62510 = 111.1012 = 1.11101 x 0.312510 = 0.01012 = 1.01 x R. Ouyed 22 So what Happened ? After normalizing, the numbers now have different mantissas and exponents. 37.2510 = 100101.012 = 1.0010101 x 25 7.62510 = 111.1012 = 1.11101 x 22 0.312510 = 0.01012 = 1.01 x 2-2 2-2 Phys381 R. Ouyed Phys381 IEEE Floating Point Representation From Binary Floating Point To Machine representation Floating point numbers can be represented by binary codes by dividing them into three parts: the sign, the exponent, and the mantissa. 0 R. Ouyed Phys381 R. Ouyed 1 8 9 32 Phys381 6 IEEE Floating Point Representation The first, or leftmost, field of our floating point representation will be the sign bit: 0 for a positive number, 1 for a negative number. IEEE Floating Point Representation The second field of the floating point number will be the exponent. Since we must be able to represent both positive and negative exponents, we will use a convention which uses a value known as a bias of 127 to determine the representation of the exponent. R. Ouyed Phys381 IEEE Floating Point Representation The mantissa is the set of 0’s and 1’s to the left of the radix point of the normalized (when the digit to the left of the radix point is 1) binary number. ex:1.00101 X 23 The biased exponent, the value actually stored, will range from 0 through 255. This is the range of values that can be represented by 8-bit, unsigned binary numbers. R. Ouyed Phys381 Phys381 Converting decimal floating point values to stored IEEE standard values. Example: 40.15625. Find the IEEE FP representation of Step 1. Compute the binary equivalent of the whole part and the fractional part. ( convert 40 and .15625. to their binary equivalents) The mantissa is stored in a 23 bit field, R. Ouyed An exponent of 5 is therefore stored as 127 + 5 or 132; an exponent of -5 is stored as 127 + (-5) OR 122. 40.1562510 = 101000.001012 R. Ouyed Phys381 7 Converting decimal floating point values to stored IEEE standard values. Step 2. Normalize the number by moving the decimal point to the right of the leftmost one. Converting decimal floating point values to stored IEEE standard values. Step 3. Convert the exponent to a biased exponent 127 + 5 = 132 101000.00101 = 1.0100000101 x 25 ==> R. Ouyed Phys381 13210 = 100001002 R. Ouyed Phys381 Covert 40.15625 to IEEE 32-bit format 40.15625 Step 1 Find binary of whole number Converting decimal floating point values to stored IEEE standard values. Step 4. Store the results from above Sign 0 Exponent (from step 3) 10000100 40-32-8 Binary value of 40 32 16 8 4 2 1 1 0 1 0 0 0 Step 1b Find binary of fraction number .15625 -.1250 - .03125 Binary value of .15625 .5 .25 .125 .0625 .03125 0 0 1 0 1 Step 2a Move decimal (Radix point) to the right of the left most 1 to come up with the exponent . 101000.00101 Step 2b Normalize by multipling the number by 2 and the exponent from step 3 Step 3a convert the exponent to a biased exponent by adding the bias of 127 to the exponent Mantissa( from step 2) 01000001010 .. 0 Equals 101000.00101 x 25 = 1.0100000101 127 + 5 = 13210 Step 3b Convert the biased exponent to its binary value Binary value of 132 132-128-4 128 32 16 1 0 0 8 0 4 2 1 1 0 0 Step 4a Piece together values. Step 4c. Left pad exponent and right pad mantissa to determine the binary equivalent of IEEE R. Ouyed Phys381 R. Ouyed Sign Exponent 0 10000100 0 010000100 Mantissa 0100000101 01000001010…0 Sign Exponent 0 010000100 Mantissa 01000001010…0 Step 4b Move to mantissa without the 1. Phys381 8 Converting decimal floating point values to stored IEEE standard values. One more example Step 2. Normalize the number by moving the decimal point to the right of the leftmost one. Find the IEEE FP representation of –24.75 Step 1. -24.7510 = -11000.112 -11000.11 = -1.100011 x 24 R. Ouyed Phys381 Converting decimal floating point values to stored IEEE standard values. Step 3. Convert the exponent to a biased exponent 127 + 4 = 131 ==> 13110 = 100000112 Step 4. Store the results from above Sign Exponent mantissa 1 10000011 1000110..0 R. Ouyed Phys381 R. Ouyed Phys381 Converting from IEEE format to the decimal floating point values. Do the steps in reverse order In reversing the normalization step move the radix point the number of digits equal to the exponent. if exponent is +ve move to the right, if –ve move to the left. R. Ouyed Phys381 9 Converting from IEEE format to the decimal floating point values. Converting from IEEE format to the decimal floating point values. Ex: Convert the following 32 bit binary numbers to their decimal floating point equivalents. Step 1 Extract exponent) a. Sign Exponent Mantissa 1 01111101 010..0 exponent biased exponent = 01111101 = 125 exponent: 125 - 127= R. Ouyed Phys381 (unbias R. Ouyed -2 Phys381 Converting from IEEE format to the decimal floating point values. Converting from IEEE format to the decimal floating point values. Step 2 Write Normalized number Step 3: Write the binary number (denormalize value from step2) 1 . ____________ mantissa x 2 -1. 01 x 2 –2 Exponent ---- -0.01012 Step 4: Convert binary number to FP equivalent ( add column values) -0.01012 = - ( 0.25 + 0.0625) = -0.3125 R. Ouyed Phys381 R. Ouyed Phys381 10 Converting from IEEE format to the decimal floating point values. Ex: Convert the following 32 bit binary numbers to their decimal floating point equivalents. Sign 0 R. Ouyed Exponent 10000011 Converting from IEEE format to the decimal floating point values. Step 1 Extract exponent) (unbias biased exponent = 10000011 = 131 Mantissa 1101010..0 Phys381 exponent exponent: 131 - 127= R. Ouyed 4 Phys381 Converting from IEEE format to the decimal floating point values. Converting from IEEE format to the decimal floating point values. Step 2 Write Normalized number Step 3 Write the binary number (denormailze value from step 2) mantissa 1 . ____________ x 2 1. 110101 x 2 4 Exponent ---- 11101.012 Step 4 Convert binary number to FP equivalent ( add column values) 11101.012 = 16 + 8 + 4 + 1 + 0.25 R. Ouyed Phys381 R. Ouyed = 29.2510 Phys381 11 Proof your work Convert 0 10000100 01000001010…0 back to IEEE Sign Exponent 0 010000100 Step 1a Determine the decimal of the binary number Step 1b Unbias the number by Subtracting 127 from the decimal number to determine the exponent Practise example Mantissa 01000001010…0 128 32 16 8 4 2 1 1 0 0 0 1 0 0 128+4 Binary value = 132 Step 2 Denormalize by multipling the number by 2 and the exponent from step 2. Set up format 1 mantissa x 2 exponent 132 – 127 = 5 . 1.0100000101 x 25 Step 3a Move decimal (Radix point) to the right of the left most 1 to come up with the exponent What decimal value is represented by the following 32-bit floating point number? 1 10000010 11110110000000000000000 Equals 101000 . 001012 Step 3b Convert binary number to FP equvalent Step 4a Find whole number of exponent 32 16 8 4 2 1 1 0 1 0 0 0 .5 .25 .125 .0625 .03125 0 0 1 0 1 32+8 = 40 .1250 + .03125 = .15625 Step 4b Find fractional numberof the mantissa Answer: -15.6875 Step 4c Add together values. Make sure to include the sign if it is a negative value 40.15625 R. Ouyed Phys381 R. Ouyed Phys381 Another Practise Example Convert -12.510 to binary: 1. Convert 12.5 to fixed point 01100.1 2. Normalize 0.11001 * 24 3. Convert exponent base to binary: 4 0100 4. 2s complement the mantissa by flipping bits and adding 1: 011001 100111 5. Final number 1 00111 0100 R. Ouyed Phys381 Double and single Precision R. Ouyed Phys381 12 Single Precision Format Double Precision Format 32 bits (4x8bits = 4 bytes … Real*4) R. Ouyed 64 bits (8x8bits = 8 bytes … Real*8) Mantissa (23 bits) Mantissa (52 bits) Exponent (8 bits) Exponent (11 bits) Sign of mantissa (1 bit) Sign of mantissa (1 bit) Phys381 R. Ouyed Phys381 51 Smallest value for a binary # The smallest value for a binary number of any number of bits is zero. 52 Smallest value for a binary # The smallest value for a binary number with any number of bits is zero (i.e. when all the bits are zeros) # of bits 1 bit: 2 bits: 3 bits: 4 bits: 5 bits: 6 bits: 7 bits: 8 bits: etc. This is the case when all bits are zero: R. Ouyed Phys381 R. Ouyed smallest binary # decimal value 0 00 000 0000 00000 000000 0000000 00000000 0 0 0 0 0 0 0 0 Phys381 13 53 Largest value for a binary # 54 Largest numbers The largest value for a binary number with a specific number of bits (i.e. digits) is when all of the bits are one. The following are the largest values for binary numbers with a specific number of bits: General rule: for a binary number with n bits, the largest possible value is : 2n - 1 R. Ouyed Phys381 largest binary # decimal value 1 bit: 2 bits: 3 bits: 4 bits: 5 bits: 6 bits: 7 bits: 8 bits: etc. 1 11 111 1111 11111 111111 1111111 11111111 1 3 7 15 31 63 127 255 R. Ouyed Maximums & minimums mantissa # of bits Normal&Abnormal IEEE Singles excess floating Number name Normal s, e, and f 0 < e < 255 Value of single (-1)sx2e-127x1.f Subnormal e=0, f = 0 (-1)sx2-126x 0.f Signed zero (+/- 0) + e=0, f=0 (-1)sx0.0 s=0,e=255,f=0 + INF s=1,e=255,f=0 - INF s=u,e=255,f=0 NaN " - " Not a Number R. Ouyed Phys381 Phys381 55 R. Ouyed Phys381 56 ! ! 14 Normal&Abnormal IEEE Doubles Number name Normal s, e, and f 0 < e < 2047 Value of single (-1)sx2e-1023x1.f Subnormal e=0, f = 0 (-1)sx2-1022x0.f Exceptions Signed zero (+/- e=0, f=0 (-1)sx0.0 0) + s=0,e=2047,f=0 + INF - " " Not a Number R. Ouyed s=1,e=2047,f=0 - INF s=u,e=2047,f=0 NaN Phys381 57 R. Ouyed Phys381 ! ! Overflow and Underflow Overflow and Underflow Overflow is an error condition that occurs when there are not enough bits to express a value in a computer. Possible for the number to be too large or too small for representation Underflow is an error condition that occurs when the result of a computation is too small for the computer to represent. 0.00001 x 10-50 = 10-55 R. Ouyed Phys381 R. Ouyed Phys381 15 There are discrete points on the number lines that can be represented by our computer. Error Analysis How about the space between ? R. Ouyed Phys381 R. Ouyed Phys381 R. Ouyed Phys381 Possible Errors truncation error overflow error round off errors using floating-point numbers because not all real numbers can be represented accurately attempting to represent a number that is greater than the upper bound for the given number of bits underflow error R. Ouyed attempting to represent a number that is less than the lower bound for the given number of bits Phys381 64 16 Rounding in binaries Round-off Error ROUNDING R. Ouyed Phys381 Round-off error is the problem of not having enough bits (or digits) to store an entire floatingpoint number and approximating the result to the nearest number that can be represented. Numbers such as π, e, or 7 also cannot be expressed by a fixed number of significant figures. Computers use a base-2 representation, they cannot precisely represent certain exact base-10 numbers. E.g., 1/3 = 0.333 ≠ 0.3 ≠ 0.33 ≠ etc. R. Ouyed Phys381 Implications First, familiar example 0.1000x101 + 0.0000(4)x101 = 0.1000x101 (a+b)+c = a+(b+c) = R. Ouyed Phys381 R. Ouyed Phys381 0.0000(8)x101 = 0.0001x101 17 Adding decimals Adding in Binary system R. Ouyed Phys381 Adding Binaries R. Ouyed Phys381 R. Ouyed Phys381 Adding Binaries R. Ouyed Phys381 18 Adding Binaries Precision R. Ouyed Phys381 Precision R. Ouyed Phys381 Precision Consider the addition of two 32-bit words (single precision): 7 + 1.0 x 10-7. Consider the addition of two 32-bit words (single precision): 7 + 1.0 x 10-7. These numbers are stored as: 7= 0 10-7 = R. Ouyed Phys381 R. Ouyed 0 10000010 11100000000000000000000 01100010 11010110101111111001010 Phys381 19 Precision Precision Because the mantissa of the numbers are different, the exponent of the smaller number is made larger while the mantissa is made smaller until they are same(shifting bits to the right while inserting zeros). Until: As a result when the calculation is carried out the computer will get: 7 + 1.0 x 10-7 = 7 Effectively the 10-7 term has been ignored. The machine precision can be defined as the maximum positive value that can be added to 1 so that its value is not changed. Another example: 10-7 = 0 01100010 00000000000000000000000(0001101…) NB: The bits in the brackets are lost. R. Ouyed Phys381 (1% 2 2& # " = 0.6666666 " 0.6666667 = "0.0000001 ! 0 ' 3$ 3 R. Ouyed Numerical Errors: fractions Phys381 One more Example Example: >b = 1/3 >b = 0.333333 > b*3 - 1 = 0 or >b =4/3 - 1 >b = 0.33333 > b*3 - 1 = -2.2204e-16 R. Ouyed Phys381 ???? R. Ouyed Phys381 80 20 Implications Solution When you program: You should write these instead: x 2 +1 +1 f ( x) = x 2 + 1 ! 1 f ( x) = ( x 2 + 1 ! 1) g ( x) = ln( x) ! 1 x g ( x) = ln( x) ! ln(e) = ln( ) e x 2 +1 +1 = x2 x 2 +1 +1 Every FP operation introduces error, but the subtraction of nearly equal numbers is the worst and should be avoided whenever possible R. Ouyed Phys381 81 R. Ouyed Avoiding Round-off Error Avoiding Round-off Error Some values (1/3) will always result in round-off; however, we can help reduce error: E.g. consider computing (x/y)z on a machine with three significant digits of precision, where x = 2.41 When adding numbers whose magnitudes are drastically different, accumulate smaller number before combining them with larger ones. Otherwise they get absorbed ! When multiplying and dividing, perform all multiplications in the numerator before dividing by the denominator. R. Ouyed Phys381 Phys381 y = 9.75 z = 1.54 (x / y)*z = (2.41 / 9.75)*(1.54) = (0.247)*(1.54) = 0.380 But (x / y)*z = (x*z) / y: (x*z) / y = ((2.41) )*(1.54)) / 9.75 = 3.71 / 9.75 = 0.381 R. Ouyed Phys381 21 Truncation Errors and the Taylor Series Truncation/Chopping errors Example: π=3.14159265358 to be stored on a base-10 system carrying 7 significant digits. π=3.141592 chopping error εt=0.00000065 If rounded π=3.141593 εt=0.00000035 Some machines use chopping, because rounding adds to the computational overhead. Since number of significant figures is large enough, resulting chopping error is negligible. R. Ouyed Phys381 85 Non-elementary functions such as trigonometric, exponential, and others are expressed in an approximate fashion using Taylor series when their values, derivatives, and integrals are computed. Any smooth function can be approximated as a polynomial. Taylor series provides a means to predict the value of a function at one point in terms of the function value and its derivatives at another point. R. Ouyed Truncation error is decreased by addition of terms to the Taylor series. If h (discretization) is sufficiently small, only a few terms may be required to obtain an approximation close enough to the actual value for practical purposes. Phys381 86 Example To get the cos(x) for small x: cos x = 1 ! If x=0.5 cos(0.5) x2 x4 x6 + ! +L 2! 4! 6! =1-0.125+0.0026041-0.0000127+ … =0.877582 From the supporting theory, for this series, the error is no greater than the first omitted term. ! R. Ouyed Phys381 87 x8 8! R. Ouyed for x = 0.5 = 0.0000001 Phys381 88 22 Truncation Errors: example Absolute and Relative Error If correct is the answer and result is the result obtained, then absolute error = | correct - result | x = tan(pi/6) y = sin(pi/6)/cos(pi/6) relative error = (absolute error) but, | correct | x - y = 1.1102e-16 ???? = (absolute error) X 100% | correct | R. Ouyed Phys381 R. Ouyed Example Phys381 Total Error Total error = Round-off Error + Truncation Error R. Ouyed Phys381 91 The total numerical error is the summation of the truncation and roundoff errors. The truncation error generally increases as the step size increases, while the roundoff error decreases as the step size increases this leads to a point of diminishing returns for step size. R. Ouyed Phys381 92 23 Decreasing the step h, reduces round-off error but increase truncation error Optimization issue: step size h ROUND-OFF ERROR R. Ouyed Error for one step h = dx TRUNCATION ERROR Phys381 93 R. Ouyed Phys381 94 Increasing the number of terms of series reduces truncation error but increase round-off error Optimization issue: step h Optimization issue: # terms ROUND-OFF ERROR R. Ouyed Phys381 95 R. Ouyed TRUNCATION ERROR Phys381 96 24 Optimization issue: # terms Error propagation Imagine computing the following in a Do loop !! See Upcoming Assignment #2 R. Ouyed Phys381 97 R. Ouyed Looping and Error Propagation Accumulating values in a loop (repeatedly) can produce error R. Ouyed Phys381 98 Patriot Missile Failure A loop is a segment of code [computer instructions] that is executed repeatedly. E.g., Patriot Missile failure in 1991 Gulf War Phys381 Patriot’s internal clock measured time in 1/10 of a second. Each tick of the clock added 1/10 s to current time. But 1/10 cannot be represented in a finite number of bits (just as 1/3 cannot be represented in a finite number of decimal digits) R. Ouyed Phys381 25 Patriot Missile Failure Each one-tenth increment produced an error of about .000000095 seconds. After ten hours of running Patriot’s computer: 10 hr 60 min 1 hr 60 sec .000000095 sec 1 min 1 sec Numerical Stability ! Numerical stability and well-posed problems = 0.34 sec An algorithm is called numerically stable if an error, whatever its cause, does not grow to be much larger during the calculation. This happens if the problem is well-conditioned, meaning that the solution changes by only a small amount if the problem data are changed by a small amount. • Scud travels 1,676 meters / sec. So error = 1676 meters 0.34 sec 1 sec = 570 meters To the contrary, if a problem is ill-conditioned, then any small error in the data will grow to be a large error. • Result Patriot missed Scud, 28 Americans died R. Ouyed Phys381 R. Ouyed Phys381 102 Accuracy versus precision Accuracy vs Precision R. Ouyed Phys381 Accuracy. How close is a computed or measured value to the true value Precision (or reproducibility). How close is a computed or measured value to previously computed or measured values. R. Ouyed Phys381 104 26 ? Accuracy versus precision R. Ouyed Phys381 R. Ouyed Phys381 105 R. Ouyed Phys381 106 107 27
© Copyright 2024