Hi all, so I posted this idea on Omni, but I figured it'd probably be more useful here.
I won't get too much into the mathematical aspects of why this works or proving quadratic convergence, but Newton's Method for square roots goes like this:

Given a non-zero starting guess to the square root of c, x0, then the sequence x where xn+1=(xn+c/xn)/2 converges to √c.

That's all fine and it even doubles the digits of accuracy every iteration, so it is fast. What it is basically doing is averaging two guesses, xn, and c/xn. But if it is doubling the digits of accuracy, then say xn has 7 digits accuracy, then those first digits will match for xn+1, which means they'll match for c/xn. Hypothetically, this means you can skip computing the first 7 digits of c/xn. This gives us:

xn+1=(xn+xn+(c/xn-xn))/2
xn+1=xn+((c-xn2)/xn)/2
xn+1=xn+(c-xn2)/(2xn)

This means we are trading half the iterations of the division for a multiplication at the previous precision. Multiplication tends to be faster on the Z80, and especially so on the eZ80.

So, if anybody wants to implement high-precision floating point square roots on the eZ80, this is the way to go. Currently for the Z80, I have the core routine working on a 31-bit mantissa with an average case of approximately 3371cc. I project the complete extended-precision routine will be under 14000cc, compared to my previous best of 28000cc, and TI's 86000cc. This also compares to 11000cc for multiplication and 19000cc for division.
Xeda112358 wrote:
That's all fine and it even doubles the digits of accuracy every iteration, so it is fast.

Actually, for bad starting guesses, the convergence can be excruciatingly slow. A good starting guess is the key to quadratic convergence from the first iteration.

Calculating square roots is described in exquisite detail in the excellent book "Math toolkit for real-time programming" by Jack Crenshaw, including things like hacking the floating point format, coming up with an optimal starting guess, and it even contains a closed-form algorithm for doing exact integer square roots using only bit-shifts and addition/subtraction (not an iterative approximation like Newton's Method) , which can be used on the mantissa of a floating point number.

It is a book I would highly recommend.
On the Z80, the bit-by-bit algorithm becomes less efficient after 24 bits. In practice I use a 16-bit square root to get the first 8 bits of the result. A subsequent Newton's iteration slightly more than doubles that to 16 bits, so for that I truncate 16, and two more iteration gives me the 64 bits I need.

As an update, I might be able to get 64-bit square roots in just over 10000cc, putting it slightly slower than my 64-bit multiply (10011cc average) which uses a recursive Karatsuba algorithm. This would put my 19-digit square root at eight times faster than TI's 14-digit square root.

This should in turn greatly speed up the computation of the inverse trig, inverse hyperbolic, and logarithm functions for which I'm using the Borchardt-Gauss algorithm as modified by Carlson. Even if my estimate for the square root timing is off by 4000cc, it'll still mean that these transcendental functions are computed to 19 decimal digits accuracy faster than the time it takes TI to compute a 14-digit square root.

EDIT: For the curious, check out Carlson's paper

Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

»
» All times are UTC - 5 Hours

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum