Binary Number System

Binary Number System: Representations

IEEE 32-Bit Floating Point Format (IEEE Standard 754)

BITS Diagram of Standard 754

33222222222211111111110000000000 10987654321098765432109876543210 SEEEEEEEEFFFFFFFFFFFFFFFFFFFFFFF = (-1)^s2^e-1271.f



where 
     s = S

           e = EEEEEEEE, e > 0, e < 255

           f = FFFFFFFFFFFFFFFFFFFFFFF

Denormals: e = 0; Specials: (NaN, Inf, -Inf) e = 255 
       

        SIMPLE EXAMPLE of format

        

        3FC000000

        0|01111111|100000... 
        

        Excess 127: real+127=127 therefore real exponent = 0
      
1.100000000 x 2⁰

        [1 x 2⁰ + 1 x 2^-1] x 2⁰

        [1 + .5] x 1 = 1.5

        

        

        EXAMPLE 1 
      

What is the IEEE 32-bit floating point representation for the decimal number
-11.5?



SOLUTION

1. Convert to binary.  To the left of the binary point, represent the magnitude
of the number to the
left of the decimal point.  To the right of the binary point, represent the
fraction to the right of the
decimal point (note: this may require a loss of accuracy).  The first position
after the binary point is
the 2^-1 position (0.5 decimal), then 2^-2 (0.25),
2^-3 (0.125), etc.
-11.5₁₀ = -1011.1₂
2. Convert to normalized binary scientific notation (i.e. move binary
point to the left or right as far
necessary until a single one is to left of the binary point, e.g.
1.f):
     -1011.1₂ = -1.0111 x 2³


Note: in the special case of 0.0, all 32 bits are 0.  This is a denormal since
there is no 1 to the left
of the binary point.
3. Determine s, e and f:
       

     s = 1 for negative, 0 for positive.



     true exponent = 3₁₀ = e - 127, thus

     e = 3 + 127 = 130₁₀ = 10000010₂



     1.f = 1.0111, thus f = 0111

4. Assemble the 32 bits, padding f to the right with zeroes:


     s      e+127       1.fffffffffffffffffffffff


     1       10000010      
   01110000000000000000000
     11000001001110000000000000000000

5. Convert to hex:


     1100 0001 0011 1000 0000 0000 0000 0000



     C1380000

Note special cases:

s = 0, e=255, f = all zeroes: +Infinity
s = 1, e=255, f = all zeroes: -Infinity
s = 0 or 1, e=255, f = anything but all zeroes: Not A Number

Number Representation	Sign	Exponent	Fraction
Normalized	+/-	0 < Exp < Max	Max Any bit pattern
Denormalized	+/-	0	Any nonzero bit pattern
Zero	+/-	0	0
Infinity	+/-	111 … 1	0
Not a number	+/-	111 … 1	Any nonzero bit pattern