Sunday, November 24, 2013

About Floating Point








What are the function of floating point?

To representation of non-integral numbers including very small and very large numbers.

5.98 x 10 7 = Significant digits × base exponent

Example

Actual
Floating Point
0.0000000478
4.78 × 10-8
0.00000001
0.1 × 10-7
-1000000000
-1.0 × 109
-0.00111
-1.11× 2-3

IEEE 754-1985 was an industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008. During its 23 years, it was the most widely used format for floating-point computation. It was implemented in software, in the form of floating-point libraries, and in hardware, in the instructions of many CPUs and FPUs. The first integrated circuit to implement the draft of what was to become IEEE 754-1985 was the Intel 8087.



IEEE Floating-Point Format

Single Precision


Double Precision





x = (-1)S × (1+Fraction) × 2(Exponent-Bias)

sign = 0, because the number is positive. (1 indicates negative.)
biased exponent = actual exponent + bias


Floating Point Example

From decimal to floating number

  • Convert -1313.3125 to IEEE 32-bit floating point format.
a.      The integral part is 131310 = 101001000012.
b.     The fractional:
0.3125
× 2 =
0.625
0
Generate 0 and continue.
0.625
× 2 =
1.25
1
Generate 1 and continue with the rest.
0.25
× 2 =
0.5
0
Generate 0 and continue.
0.5
× 2 =
1.0
1
Generate 1 and nothing remains.
c.      So 1313.312510 = 10100100001.01012.
d.     Normalize: 10100100001.01012 = 1.010010000101012 × 210.
e.      Fraction is 01001000010101000000000,
exponent is 10 + 127 = 137 = 100010012,
sign bit is 1 because it is negative number.

Then you will get the answers as follow:
Binary 32 bits
Sign [1bit]
Exponent [8bits]
Fraction [23bits]
1 (-)
10001001 
01001000010101
000000000 

Binary 64 bits
Sign [1bit]
Exponent [11bits]
Fraction [52bits]
1 (-)
10000001001 
01001000010101
0000000000000000
0000000000000000
000000 

From floating point to decimal

a.      Separate:
01000100001101100001000000000000 2
Sign [1bit]
Exponent [8bits]
Fraction [23bits]
0(+)
10001000
011011000010
00000000000

b.     Exponent: 100010002 = 13610; 136 − 127 = 9.
c.      Denormalize: 1.011011000012 × 29 = 1011011000.01.
d.     Convert:

Exponents      29    28   27   225  24  2221  20 2-1  2-2
Place Values 512 256 128 64 32 16  8   4   2   1  0.5 0.25
 Bits               1      0    1   1   0   1   1   0   0  0 . 0     1
Value            512     +128+64  +16 +8                   +0.25=728.25

e.      Sign: positive because the sign 1bit is 0
Result: 01000100001101100001000000000000 2 is 
728.25.

Floating point addition
is analogous to addition using scientific notation. For example, to add 2.25x 10^0 to 1.340625x 10^2 :

1.     Shift the decimal point of the smaller number to the left until the exponents are equal. Thus, the first number becomes 0.0225x 10^2 .
2.     Add the numbers with decimal points aligned:


3.     Normalize the result. = 1.363125x10^2

Floating Point Multiplication

Multiply the following two numbers in scientific notation by hand:

1.110 × 1010 × 9.200 × 10-5

1.   Add the exponents to find

New Exponent = 10 + (-5) = 5

If we add biased exponents, bias will be added twice. Therefore we need to subtract it once to compensate:
(10 + 127) + (-5 + 127) = 259
259 - 127 = 132 which is (5 + 127) = biased new exponent

2.   Multiply

1.110 × 9.200 = 10.212000

Can only keep three digits to the right of the decimal point, 
so the result is

10.212 × 105

3.   Normalise the result

1.0212 × 106

4.   Round it

1.021 × 106

 Published by 
SITI NURHASTINI BINTI ROSALI  ( B031310320 )

No comments:

Post a Comment

 

Copyright © 2013 | by BITS STUDENT SIG2