Hi all, so I posted this idea on Omni, but I figured it'd probably be more useful here.

I won't get too much into the mathematical aspects of why this works or proving quadratic convergence, but Newton's Method for square roots goes like this:

Given a non-zero starting guess to the square root of c, x

That's all fine and it even doubles the digits of accuracy every iteration, so it is fast. What it is basically doing is averaging two guesses, x

x

x

x

This means we are trading half the iterations of the division for a multiplication at the previous precision. Multiplication tends to be faster on the Z80, and especially so on the eZ80.

So, if anybody wants to implement high-precision floating point square roots on the eZ80, this is the way to go. Currently for the Z80, I have the core routine working on a 31-bit mantissa with an average case of approximately 3371cc. I project the complete extended-precision routine will be under 14000cc, compared to my previous best of 28000cc, and TI's 86000cc. This also compares to 11000cc for multiplication and 19000cc for division.

I won't get too much into the mathematical aspects of why this works or proving quadratic convergence, but Newton's Method for square roots goes like this:

Given a non-zero starting guess to the square root of c, x

_{0}, then the sequence x where x_{n+1}=(x_{n}+c/x_{n})/2 converges to √c.That's all fine and it even doubles the digits of accuracy every iteration, so it is fast. What it is basically doing is averaging two guesses, x

_{n}, and c/x_{n}. But if it is doubling the digits of accuracy, then say x_{n}has 7 digits accuracy, then those first digits will match for x_{n+1}, which means they'll match for c/x_{n}. Hypothetically, this means you can skip computing the first 7 digits of c/x_{n}. This gives us:x

_{n+1}=(x_{n}+x_{n}+(c/x_{n}-x_{n}))/2x

_{n+1}=x_{n}+((c-x_{n}^{2})/x_{n})/2x

_{n+1}=x_{n}+(c-x_{n}^{2})/(2x_{n})This means we are trading half the iterations of the division for a multiplication at the previous precision. Multiplication tends to be faster on the Z80, and especially so on the eZ80.

So, if anybody wants to implement high-precision floating point square roots on the eZ80, this is the way to go. Currently for the Z80, I have the core routine working on a 31-bit mantissa with an average case of approximately 3371cc. I project the complete extended-precision routine will be under 14000cc, compared to my previous best of 28000cc, and TI's 86000cc. This also compares to 11000cc for multiplication and 19000cc for division.