Computer Architecture, Lecture 14: Does your computer know how to add?

Cairo University
Electronics and Communications Department
A strange behavior
What do you expect from this code?
#i n c l u d e < s t d i o . h >
i n t main ( v o i d )
{
f l o a t x =+0.0 , y = −0.0;
Computer Architecture, Lecture 14:
Does your computer know how to add?
printf
printf
printf
printf
printf
printf
printf
printf
printf
printf
printf
}
Hossam A. H. Fahmy
( ” \ n \ t \ t The a m a z i n g r e s u l t s : \ n ” ) ;
( ” \ n 1 0 ˆ 3 0 = %f ” , 1 0 . 0 E30 ) ;
( ” \ n 10ˆ30 −10ˆ30 = 0= %f ” , 1 0 . 0 E30 − 1 0 . 0 E30 ) ;
( ” \ n 1 0 ˆ 6 0 = 1 0 ˆ 3 0 ∗ 1 0 ˆ 3 0 = %f ” , 1 0 . 0 E30 ∗ 1 0 . 0 E30 ) ;
( ” \ n 1 0 ˆ 6 0 = 1 0 ˆ 3 0 ∗ 1 0 ˆ 3 0 = %f ” , ( f l o a t ) ( 1 0 . 0 E30 ∗ 1 0 . 0 E30 ) ) ;
( ” \ n 1 . 0 / ( + 0 . 0 ) = %f ” , 1 . 0 / x ) ;
( ” \ n 1 . 0 / ( − 0 . 0 ) = %f ” , 1 . 0 / y ) ;
( ” \ n +0.0 + ( −0.0)= %f ” , x+y ) ;
( ” \ n −0.0 − ( −0.0)= %f ” , y−y ) ;
( ” \ n 1 . 0 / ( + 0 . 0 ) + 1 / ( − 0 . 0 ) = %f ” , ( 1 . 0 / x ) + ( 1 . 0 / y ) ) ;
( ” \ n \ n \ t +−∗/+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/\ n \ n ” ) ;
c Hossam A. H. Fahmy
1/24
Simple results!
Scientific notation
Here is the output.
The a m a z i n g
results :
10ˆ30 = 9999999999999999635896294965248.000000
10ˆ30 −10ˆ30 = 0= 0 . 0 0 0 0 0 0
10ˆ60 = 10ˆ30∗10ˆ30 =
99999999999999992084218144295482124579792562202350734542897152.000000
10ˆ60 = 10ˆ30∗10ˆ30 = i n f
1.0/(+0.0) = i n f
1.0/( −0.0) = − i n f
+0.0 + ( −0.0)= 0 . 0 0 0 0 0 0
−0.0 − ( −0.0)= 0 . 0 0 0 0 0 0
1 . 0 / ( + 0 . 0 ) + 1/( −0.0)= nan
The value of a number in scientific notation has six attributes:
± d0·d−1d−2 · · · d−t ×β ±exp
↑ ↑
↑
↑↑ ↑
1 2
3
45 6
The computer representation of floating point numbers is similar.
+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/+−∗/
2/24
3/24
Simple tests
Results of simple tests
Consider the following C code
#i n c l u d e < s t d i o . h >
int
main ( v o i d )
{
double x , y ;
d o u b l e twox , t h r e e x ;
d o u b l e z1 , z 2 ;
x =0.1;
twox
Once we compile and run this code the result is:
y =0.3;
= 2.0∗ x ;
3 x−y
3 x−y
3 x−y
3 x−y
threex = 3.0∗ x ;
z 1= t h r e e x − y ;
z 2= t w o x
− y + x;
printf
printf
printf
printf
(”\n
(”\n
(”\n
(”\n
3 x−y
3 x−y
3 x−y
3 x−y
=
=
=
=
=
=
=
=
5 . 5 5 1 1 1 5 e −17
0.000000
0.0000000000000000555111512312578270211816
2 . 7 7 5 5 5 8 e −17
2 x−y+x= 2 . 7 7 5 5 5 8 e −17
%e ” , z 1 ) ;
%f ” , z 1 ) ;
%40.40 f ” , z 1 ) ;
%e \ n ” , 3 . 0 ∗ x−y ) ;
( 3 x−y ) / ( 2 x−y+x )= 2 . 0 0 0 0 0 0 e +00 o r
(4/3 −1) ∗ 3 −1
1 . 0 0 0 0 0 0 e +00
= − 2 . 2 2 0 4 4 6 e −16
p r i n t f ( ” \ n 2 x−y+x= %e \ n ” , z 2 ) ;
p r i n t f ( ” \ n ( 3 x−y ) / ( 2 x−y+x )= %e
p r i n t f ( ” \ n (4/3 −1) ∗ 3 −1
}
o r %e \ n ” , z 1 / z2 ,
( 3 . 0 ∗ x−y ) / ( 2 . 0 ∗ x−y+x ) ) ;
= %e \ n \ n ” , ( 4 . 0 / 3 . 0 − 1 ) ∗ 3 . 0 − 1 . 0 ) ;
4/24
Do we need decimal?
5/24
Humans and decimal numbers
(1/10)β=10 = (0.1)β=10 but in binary it is (0.000110011001100 . . .)β=2
which the computer rounds into a finite representation.
For a computer using binary64, if y = 0.30 and x = 0.10 then
3x − y = 5.6 × 10−17. Furthermore, 2x − y + x = 2.8 × 10−17.
Leading to the wonderful surprise that
3x − y =2 !
2x − y + x (x=0.1,y=0.3)
If both measurements are normalized to 5 × 10−2 and stored in a
format with 16 digits as (5.000000000000000 × 10−2) they are
• indistinguishable and
• give the incorrect impression of a much higher accuracy
(0.050000000000000 kg).
To maintain the distinction, we should store
0.000000000000050 × 1012
0.000000000000005 × 1013
For a human, is 0.050 kg = 0.05 kg?
first measurement
second measurement
with all those leading zeros. Both are members of the same cohort.
6/24
7/24
IEEE standard
Real numbers to floating numbers
• The old IEEE 754 standard of 1985 had only binary floating
point formats.
• The revised IEEE 754-2008 standard includes also decimal64
and decimal128 formats.
Sign
Combination
Trailing Significand
±
exponent and MSD
t = 10J bits
13 bits, bias = 398
17 bits, bias = 6176
50 bits, 15 + 1 digits
110 bits, 33 + 1 digits
64 bits: 1 bit
128 bits: 1 bit
• The IEEE standard specifies five exceptional conditions that
may occur during an arithmetic operation: invalid, division by
zero, overflow, underflow, and inexact result.
• It also specifies five rounding directions for inexact results.
x x
odd even
-
-
-
-
-
RA
RZ
-
−∞ -
4
5
-
0
RNE
RNA
RNZ
- +∞
x x
even
odd
-
-
-
Note the difference between RNE, RNA, and RNZ in tie cases.
8/24
9/24
Back to addition
Multiplication
Decimal examples:





1.324 ×105  +
3
+ 1.576 ×10 



 ≈





7
9.853 ×10  +
+ 1.466 ×106 



 ≈
1.324
0.01576
1.33976
1.340
×105
×105
×105
×105
1. No alignment is necessary.
2. Multiply the significands.
×107
9.853
0.1466 ×107
9.9996 ×107
1.000 ×108


1.324 ×103




3
+ 8.679 ×103
1.324 ×10
0.003 ×103
− 1.321 ×103 



?

= 3.000 ×100
3. Add the exponents.
4. The sign bit of the result is the XOR of the two operand signs.
Is it really that simple?
10/24
11/24
Division
What is different in DFP?
1. No alignment is necessary.
Decimal formats
2. Divide the significands.
• may have leading zeros,
and
3. Subtract the exponents.
4. The sign bit of the result is the XOR of the two operand signs.
• may be in either binary integer decimal (BID) or densely packed
decimal (DPD) encoding.
You know it is not that simple!
12/24
13/24
Energy
Verification
The use of hardware for decimal operations instead of software leads
to
“Produce high quality test vectors for the various arithmetic operations”
• a much shorter time to finish the arithmetic operation (factor
may be 100 to 1 or more)
and
Our test vectors discovered bugs in several designs:
• in all the decimal64 and decimal128 designs of SilMinds floating point arithmetic operations. In fact this project started in order to verify those specific
designs which were subsequently corrected.
• no energy consumed in overheads such as fetching and decoding
of software instructions.
• in the FMA and Sqrt operations of the decimal floating point library decNumber developed by Dr. Mike Cowlishaw while at IBM. We reported the
FMA errors in July and August of 2010 and the SQRT error in December
2010 all against version 3.68.
However, the additional circuits consume static power when idle and
burn energy.
• in the FMA for decimal128 in the Intel decimal floating-point math library
developed by Dr. Marius Cornea from Intel. We reported the errors in July
2011 against version 2.0 which was subsequently fixed in version 2.0 update
1 of August 2011.
In general, the use of decimal HW is much more energy efficient
than SW.
14/24
(See http://eece.cu.edu.eg/~hfahmy/arith_debug/.)
15/24
World published research
Patents pending in USA
1. R. Raafat, A. Mohamed, H. A. H. Fahmy, Y. Farouk, M. Elkhouly, T. Eldeeb, and R. Samy, “Decimal floating-point square-root unit using NewtonRaphson iterations,” July 2011. US Patent application number 13177488.
2. A. Mohamed, R. Raafat, H. A. H. Fahmy, T. Eldeeb, Y. Farouk, R. Samy,
and M. Elkhouly, “Parallel redundant decimal fused-multiply-add circuit,”
July 2011. US Patent application number 13177491.
3. A. Mohamed, H. A. H. Fahmy, R. Raafat, Y. Farouk, M. Elkhouly, R. Samy,
and T. Eldeeb, “Rounding unit for decimal floating-point division,” July
2011. US Patent application number 13177484.
4. R. Samy, H. A. H. Fahmy, T. Eldeeb, R. Raafat, Y. Farouk, M. Elkhouly, and
A. Mohamed, “Decimal floating-point fused multiply-add unit,” July 2011.
US Patent application number 13177549.
5. T. ElDeeb, H. A. H. Fahmy, and M. Y. Hassan, “Decimal elementary functions computation,” Nov. 2011. US Patent application number 13293061.
6. T. Eldeeb, H. A. H. Fahmy, A. Elhosiny, M. Y. Hassan, Y. Aly, and R. Raafat,
“Decimal floating-point processor,” Dec. 2011. US Patent application number 61577590.
7. A. A. Ayoub, H. A. H. Fahmy, and T. Eldeeb, “DPD/BCD to BID converters,” Oct. 2012. US Patent application number 13644374.
8. A. A. Ayoub and H. A. H. Fahmy, “BID to BCD/DPD converters,” Oct.
2012. US Patent application number 13657458.
• Universities (representative list)
Wisconsin, USA: DPD add, mul, div, sqrt.
SDC, Spain: DPD mul, elementary functions.
Malaca, Spain: rounding for BID, mul, add.
Sh-B, Iran: redundant decimal.
Cairo, Egypt: DPD add, mul, div, sqrt, fma, energy, verification.
Sask. , Canada: DPD log and antilog.
Utah, USA: Complete SW decimal library and adaptation of gcc.
• Companies (complete list)
IBM, UK: SW implementation of DPD.
IBM, USA: DPD HW add, mul, div, sqrt, and SW DFPAL.
Intel, USA: SW implementation of BID.
SAP, Germany: All internal representations are now DPD.
SilMinds, Egypt: DPD HW for add, mul, div, sqrt, fma, pow, . . . .
Fujitsu, Japan: DPD HW add, mul, and div.
16/24
17/24
Decimal future
Real world problems
• Inclusion in language standards (already in C, Fortran, Python,
and COBOL).
Check http://ta.twi.tudelft.nl/users/vuik/wi211/disasters.html for
some disasters caused by numerical errors.
• Wider adoption in industry.
1. Patriot Missile Failure
• Use of decimal in new applications.
2. Explosion of the Ariane 5
• HW implementation of decimal elementary functions.
3. The Vancouver Stock Exchange
• Units combining binary and decimal.
4. Rounding error changes Parliament makeup
• Verification engines for compliance.
5. . . .
18/24
19/24
Optimization!
Check this code:
Optimization!
(adapted from “The pitfalls of verifying floating-point computations”
Check this code:
by David Monniaux)
by David Monniaux)
#i n c l u d e < s t d i o . h >
#i n c l u d e < s t d i o . h >
#i n c l u d e < math . h >
#i n c l u d e < math . h >
double modulo ( double x , double
d o u b l e d e l t a = m a xi − m i n i ;
d o u b l e d e c l = x− m i n i ;
double q = d e c l / d e l t a ;
return x − f l o o r (q )∗ d e l t a ;
}
mini ,
double modulo ( double x , double
d o u b l e d e l t a = m a xi − m i n i ;
d o u b l e d e c l = x− m i n i ;
double q = d e c l / d e l t a ;
return x − f l o o r (q )∗ d e l t a ;
}
double maxi ) {
i n t main ( ) {
double r = modulo ( n e x t a f t e r ( 1 8 0 . 0 , 0 . 0 ) ,
p r i n t f ( ” \ n The v a l u e i s %40.40 f \ n ” , r ) ;
}
−180.0 ,
(adapted from “The pitfalls of verifying floating-point computations”
mini ,
double maxi ) {
i n t main ( ) {
double r = modulo ( n e x t a f t e r ( 1 8 0 . 0 , 0 . 0 ) ,
p r i n t f ( ” \ n The v a l u e i s %40.40 f \ n ” , r ) ;
}
180.0);
−180.0 ,
180.0);
• With gcc -O0 we get: -180.0000000000000284217094304040074348449707
• With gcc -O0 we get: -180.0000000000000284217094304040074348449707
• With gcc -O1 we get: 179.9999999999999715782905695959925651550293
• With gcc -O1 we get: 179.9999999999999715782905695959925651550293
20/24
21/24
You cannot always optimize!
Solution?
#i n c l u d e < s t d i o . h >
#i n c l u d e < math . h >
double modulo ( double x , double m i n i ,
d o u b l e d e l t a = m ax i − m i n i ;
d o u b l e d e c l = x− m i n i ;
p r i n t f ( ” \ n d e c l=%f \ n ” , d e c l ) ;
double q = d e c l / d e l t a ;
return x − f l o o r (q )∗ d e l t a ;
}
The problem disappears in
double maxi ) {
i n t main ( ) {
double r = modulo ( n e x t a f t e r ( 1 8 0 . 0 , 0 . 0 ) ,
p r i n t f ( ” \ n The v a l u e i s %40.40 f \ n ” , r ) ;
}
−180.0 ,
double modulo ( double x , double m i n i , double maxi ) {
d o u b l e d e l t a = m ax i − m i n i ;
r e t u r n x − f l o o r ( ( x− m i n i ) / d e l t a ) ∗ d e l t a ;
}
180.0);
• With gcc -O0 we get: -180.0000000000000284217094304040074348449707
• With gcc -O1 we get: -180.0000000000000284217094304040074348449707
22/24
Quote: “If hundreds of thousands of systems featuring a defective
component are manufactured every year, there will be real failures
happening when the systems are deployed — and they will be very
difficult to recreate and diagnose. If the system is implemented
in single precision, the odds are considerably higher with the same
probabilistic assumptions: a single 100 Hz system would break down
twice a day.”
23/24
Summary
• Arithmetic blocks are everywhere in digital circuits.
• It is possible to change the representation in order to ease the
implementation of certain tasks.
• Floating point programs and circuits are “tricky”: hard to design
and hard to verify.
24/24