Every number that a modern digital computer works with is rational and therefore has a finite or repeating decimal representation. Even though we often think of computers as working with real numbers (Fortran even has a "real" data type), they do not. Rather, they can only work with an approximation of irrational numbers and repeating rational numbers.
Often the fact that computers use approximations to most real numbers is not a problem. Sometimes, however, it is, and overlooking this fact can have catastrophic consequences.
In this exercise we examine some of the issues involved with the loss of significance when real numbers are approximated on a digital computer.
Start Octave and set x equal to 0.520. This can be done with the command x = 0.5^20. Set y equal to sin(x) with the command y = sin(x). Now compare the values of x and y (you can type x, press Enter, then type y and press Enter again to see both values). You should notice that they differ by a very small amount, or perhaps they appear to be the same number.
Next type format long e and press Enter. This will cause stored numbers to be displayed in normalized exponential format with all but the least significant bits in the floating point representation of the numbers used to to show their decimal representation. Examine x and y again.
On a piece of paper, write down the values of x and y using all the digits shown. Manually compute x − y. You should find that only three non-zero significant figures remain.
Now type d = x − y to compute x − sin(x). This value is a very small number, and should be almost the same as the number you computed manually. Why do you think that they are not exactly the same? (Hint: you are seeing all the significant digits of the numbers, which is not quite the same as seeing all the significant bits.)
Now we want to compare the value computed in this "direct fashion" with a more accurate value. How can we get the more accurate value? Recall the Taylor series for sin(x):
For values of x near zero this series converges very quickly. It also allows us to conveniently and accurately compute the value of x − sin(x) for small values of x.
so we find that
We can compute an approximation to this series easily. As it turns out, we will only need the first several terms of the series. Why is the error introduced by truncating the series is no larger than the magnitude of the first term dropped? (Hint: think about the types of series that you learned about in calculus.)
To compute the partial sum of the series you'll need to compute values like 3! and 5!. Octave has a built-in factorial function that you can use. The first partial sum can be computed with s = x^3/factorial(3) and then the second partial sum can be computed with s = s − x^5/factorial(5).
Compute a running sum of the terms of the series until you notice that the last term added did not change the value of the sum at all (it won't take very many terms!).
Finally, compare the value you just obtained with the "direct" value obtained above (the one for x − y). You should notice that they differ after only four significant figures. Explain why this is so. Which of these two values is the more accurate?
Please enter your name below then click the submit button.
Name: