COA: About Floating Point

What are the function of floating point?

To representation of non-integral numbers including very small and very large numbers.

5.98 x 10 ⁷ = Significant digits × base ^exponent

Example

Actual	Floating Point
0.0000000478	`4.78 × 10^-8`
0.00000001	0.1 × 10^-7
-1000000000	-1.0 × 10⁹
-0.00111	`-1.11× 2^-3`

IEEE 754-1985 was an industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008. During its 23 years, it was the most widely used format for floating-point computation. It was implemented in software, in the form of floating-point libraries, and in hardware, in the instructions of many CPUs and FPUs. The first integrated circuit to implement the draft of what was to become IEEE 754-1985 was the Intel 8087.

IEEE Floating-Point Format

Single Precision

Double Precision

x = (-1)^S × (1+Fraction) × 2^{(Exponent-Bias)}

sign = 0, because the number is positive. (1 indicates negative.)

biased exponent = actual exponent + bias

Floating Point Example

From decimal to floating number

Convert -1313.3125 to IEEE 32-bit floating point format.

a. The integral part is 1313₁₀ = 10100100001₂.

b. The fractional:

0.3125	× 2 =	0.625	0	Generate 0 and continue.
0.625	× 2 =	1.25	1	Generate 1 and continue with the rest.
0.25	× 2 =	0.5	0	Generate 0 and continue.
0.5	× 2 =	1.0	1	Generate 1 and nothing remains.

c. So 1313.3125₁₀ = 10100100001.0101₂.

d. Normalize: 10100100001.0101₂ = 1.01001000010101₂ × 2¹⁰.

e. Fraction is 01001000010101000000000,

exponent is 10 + 127 = 137 = 10001001₂,

sign bit is 1 because it is negative number.

Then you will get the answers as follow:

Binary 32 bits

Sign [1bit]	Exponent [8bits]	Fraction [23bits]
1 (-)	10001001	01001000010101 000000000

Binary 64 bits

Sign [1bit]	Exponent [11bits]	Fraction [52bits]
1 (-)	10000001001	01001000010101 0000000000000000 0000000000000000 000000

From floating point to decimal

a. Separate:

01000100001101100001000000000000 ₂

Sign [1bit]	Exponent [8bits]	Fraction [23bits]
0(+)	10001000	011011000010 00000000000

b. Exponent: 10001000₂ = 136₁₀; 136 − 127 = 9.

c. Denormalize: 1.01101100001₂ × 2⁹ = 1011011000.01.

d. Convert:

Exponents 2⁹2⁸2⁷2⁶2⁵2⁴2³2²2¹2⁰2^-12^-2
Place Values 512 256 128 64 32 16 8 4 2 1 0.5 0.25

Bits 1 0 1 1 0 1 1 0 0 0 . 0 1

Value 512 +128+64 +16 +8 +0.25=728.25

e. Sign: positive because the sign 1bit is 0

Result: 01000100001101100001000000000000 ₂is
728.25.

Floating point addition

is analogous to addition using scientific notation. For example, to add 2.25x 10^0 to 1.340625x 10^2 :

1. Shift the decimal point of the smaller number to the left until the exponents are equal. Thus, the first number becomes 0.0225x 10^2 .

2. Add the numbers with decimal points aligned:

3. Normalize the result. = 1.363125x10^2

Floating Point Multiplication

Multiply the following two numbers in scientific notation by hand:

1.110 × 10¹⁰ × 9.200 × 10^-5

1. Add the exponents to find

New Exponent = 10 + (-5) = 5

If we add biased exponents, bias will be added twice. Therefore we need to subtract it once to compensate:

(10 + 127) + (-5 + 127) = 259

259 - 127 = 132 which is (5 + 127) = biased new exponent

2. Multiply

1.110 × 9.200 = 10.212000

Can only keep three digits to the right of the decimal point,
so the result is

10.212 × 10⁵

3. Normalise the result

1.0212 × 10⁶

4. Round it

1.021 × 10⁶

Published by

SITI NURHASTINI BINTI ROSALI ( B031310320 )

COA

Sunday, November 24, 2013

About Floating Point

No comments:

Intro

Followers

Chat Box

Knowledege

Total Pageviews