Numbers and Arithmetic

Numbers and Arithmetic
unsigned integers
 signed integers
 BCD – binary coded decimal
 fixed-point real numbers
 floating-point real numbers
 complex numbers

Unsigned Integers

non-negative numbers
 used
to control operation of a digital system
e.g. counting iterations, table indices, …
 represent real-world data e.g. temperature,
position, time, …
=> use unsigned binary representation
Binary Representation

n bits: represent numbers from 0 to 2n – 1
or 0000…00 to 1111…11

to represent x where 0 ≤ x ≤ N – 1, need
log2N bits

number of bits required depends on the
system being designed and the application
Unsigned Integers in VHDL

create the binary number using std_logic_vector
or

use unsigned vector type
and associated arithmetic operations
from the IEEE numeric_std library

VHDL defines ‘unsigned’ as:
type UNSIGNED is array (NATURAL range <>) of STD_LOGIC;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity multiplexer_6bit_4to1 is
port (
a0, a1, a2, a3 : in unsigned (5 downto 0);
sel
: in std_logic_vector (1 downto 0);
z
: out unsigned (5 downto 0) );
end entity multiplexer_6bit_4to1;
architecture eqn of multiplexer_6bit_4to1 is
begin
with sel select z <=
a0 when "00",
a1 when "01",
a2 when "10",
a3 when others;
end architecture eqn;
Resizing Unsigned Integers
(1) Assign x to y; x shorter than y
(2) Assign y to x; x shorter than y
x0
x1
y0
y1
y0
y1
x0
x1
…
…
…
…
xn − 1
yn − 1
yn
yn − 1
yn
xn − 1
…
…
ym − 2
ym − 1
signal x : unsigned(3 downto 0);
signal y : unsigned(7 downto 0);
ym − 2
ym − 1
signal x : unsigned(3 downto 0);
signal y : unsigned(7 downto 0);
y <= "0000" & x;
x <= y(3 downto 0);
y <= resize (x, 8);
x <= resize (y, 4);
function RESIZE (ARG: UNSIGNED; NEW_SIZE: NATURAL) return UNSIGNED is
constant ARG_LEFT: INTEGER := ARG'LENGTH-1;
alias XARG: UNSIGNED(ARG_LEFT downto 0) is ARG;
variable RESULT: UNSIGNED(NEW_SIZE-1 downto 0) := (others => '0');
begin
[ … some error checking removed … ]
if (RESULT'LENGTH < ARG'LENGTH) then
RESULT(RESULT'LEFT downto 0) := XARG(RESULT'LEFT downto 0);
else
RESULT(RESULT'LEFT downto XARG'LEFT+1) := (others => '0');
RESULT(XARG'LEFT downto 0) := XARG;
end if;
return RESULT;
end RESIZE;
Unsigned Addition
0011 1010
1110 0000
Adders in VHDL
library ieee; use ieee.numeric_std.all;
...
signal a, b, sum: unsigned(7 downto 0);
...
sum <= a + b;
signal tmp_result : unsigned(8 downto 0);
signal carry : std_logic;
...
tmp_result <= ('0' & a) + ('0' & b);
carry <= tmp_result(8);
sum <= tmp_result(7 downto 0);
Unsigned Subtraction
1110 1010
0011 0000
0011 1010
1110 0000
Signed Integers

positive and negative numbers
 used
in computer arithmetic
 represent real-world data e.g. temperature,
position, …
=> use 2s complement binary representation
2s Complement Representation

top digit indicates sign
1 ⇒ negative
0 ⇒ non-negative

range (most negative to most positive)
–2n–1
to +2n–1 – 1
1000…0
to 0111…1
Signed Integers in VHDL

create the binary number using std_logic_vector
or


use signed vector type and associated arithmetic
operations from the IEEE numeric_std library
VHDL defines ‘signed’ as:
type SIGNED is array (NATURAL range <>) of STD_LOGIC;
library ieee; use ieee.numeric_std.all;
...
signal s1 : unsigned(11 downto 0);
signal s2 : signed(11 downto 0);
...
s1 <= s2;
-- NO!!!
s1 <= unsigned(s2);
-- if s2 is known to be non-negative
s2 <= signed(s1);
-- if s1 is known to be less than 2**11
Resizing Signed Integers
(1) Assign x to y; x shorter than y
(2) Assign y to x; x shorter than y
x0
x1
y0
y1
y0
y1
x0
x1
…
…
…
…
xn − 1
yn − 1
yn
yn − 1
yn
xn − 1
…
…
ym − 2
ym − 1
ym − 2
ym − 1
sign extension
signal x : signed (7 downto 0);
signal y : signed (15 downto 0);
...
y <= resize(x, 16);
or
y <= resize(x, y'length);
unpredictable
signal x : signed (7 downto 0);
signal y : signed (15 downto 0);
...
x <= resize(y, 8);
or
x <= resize(y, x'length);
Signed Addition Examples
0011 1010
1110 0000
0110 1010
0100 0000
Signed Addition in VHDL
signal v1, v2 : signed(11 downto 0);
signal sum : signed(12 downto 0);
...
sum <= resize(v1, sum'length) + resize(v2, sum'length);
sum <= ‘0’&v1 + ‘0’&v2
-- will this work?
signal x, y, z
: signed(7 downto 0);
signal ovf
: std_logic;
...
z <= x + y;
ovf <= (not x(7) and not y(7) and z(7))
or (x(7) and y(7) and not z(7));
Fixed-Point Numbers

many applications use real numbers
 especially
signal-processing applications
(DSP)

fixed-point numbers
 allow
for fractional parts
 represented as integers with an implied binary
point
 can be unsigned or signed
Unsigned Fixed-Point

represent as a bit vector: 10101
 binary

point is implicit
n-bit unsigned fixed-point
m
bits before and f bits after binary point
Range: 0 to 2m – 2–f
 Precision: 2–f
 m may be ≤ 0, giving fractions only

 e.g.
if m= –2 then 0.0001001101
Signed Fixed-Point

n-bit signed 2s-complement fixed-point
m
bits before and f bits after binary point
Range: –2m–1 to 2m–1 – 2–f
 Precision: 2–f
 e.g. 111101, signed fixed-point, m = 2

 11.11012
= –0.187510
Choosing Range and Precision


choice depends on application
need to understand the numerical behavior of
computations performed
 some

operations can magnify quantization errors
in DSP
 fixed-point
range affects dynamic range
 precision affects signal-to-noise ratio

perform simulations to evaluate effects
Fixed-Point in VHDL

use numeric_bit or std_logic_vector with
implied scaling

fixed_pkg package
 standardized
IEEE 1076-2008
 types are ufixed and sfixed

in Altera, use floating-point megafunctions
Floating-Point Numbers
allow for larger range, with same relative
precision throughout the range
 use of IEEE-754 standard common
 IEEE single precision, 32 bits

s
= 1, e = 8, m = 23
 range ≈ ±1.2 × 10–38 to ±1.7 × 1038
 precision ≈ 7 decimal digits
IEEE Floating-Point Format
e bits
s exponent
m bits
mantissa
x  M  2  (1 s) 1.mantissa  2
E


exponent 2e1 1
s: sign bit (0  non-negative, 1  negative)
Normalize: 1.0 ≤ |M| < 2.0
M
always has a leading pre-binary-point 1 bit, so
no need to represent it explicitly (hidden bit)

Exponent: excess representation: E + 2e–1–1
Floating Point Examples
0/1 00000000 00000000000000000000000 = +/- 0
0/1 11111111 00000000000000000000000 = +/- Infinity
0/1 11111111 00000100000000000000000 = NaN
0 10000000 00000000000000000000000
= +1 * 1.0 * 2**(128-127) = 2
0 10000001 10100000000000000000000
= +1 * 1.101 * 2**(129-127) = 6.5
1 10000001 10100000000000000000000
= -1 * 1.101 * 2**(129-127) = -6.5
Floating-Point Range
Exponents 000...0 and 111...1 reserved
 Smallest value

000...01  E = –2e–1 + 2
 mantissa: 0000...00  M = 1.0
 exponent:

Largest value
111...10  E = 2e–1 – 1
 mantissa: 111...11  M ≈ 2.0
 exponent:

Range: 2
2e1  2
 x 2
2e1
Floating Point

very complicated
 e.g.
addition
unpack, align binary points, adjust exponents
 add mantissas, check for exceptions
 round and normalize result, adjust exponent


cannot be done with a combinational
circuit
Floating-Point in VHDL

float_pkg package
 standardized
by IEEE 1976-2008
 types are float, float32, float64, float128


for Quartus, use floating point megafunctions
Quartus uses math_real and associated
arithmetic operations from the IEEE math_real
library (IEEE Std 1076.2-1996)
 common
REAL constants
 common REAL elementary mathematical functions.
Complex Numbers

use math_complex and associated arithmetic operations from the
IEEE math_complex package in math_real library
type COMPLEX is
record
RE: REAL;
IM: REAL;
end record;
-- Real part
-- Imaginary part
subtype POSITIVE_REAL is REAL range 0.0 to REAL'HIGH;
subtype PRINCIPAL_VALUE is REAL range -MATH_PI to MATH_PI;
type COMPLEX_POLAR is
record
MAG: POSITIVE_REAL;
-- Magnitude
ARG: PRINCIPAL_VALUE; -- Angle in radians; -MATH_PI is illegal
end record;