Hi all, so I posted this idea on Omni, but I figured it'd probably be more useful here.
I won't get too much into the mathematical aspects of why this works or proving quadratic convergence, but Newton's Method for square roots goes like this:
Given a non-zero starting guess to the square root of c, x0, then the sequence x where xn+1=(xn+c/xn)/2 converges to √c.
That's all fine and it even doubles the digits of accuracy every iteration, so it is fast. What it is basically doing is averaging two guesses, xn, and c/xn. But if it is doubling the digits of accuracy, then say xn has 7 digits accuracy, then those first digits will match for xn+1, which means they'll match for c/xn. Hypothetically, this means you can skip computing the first 7 digits of c/xn. This gives us:
xn+1=(xn+xn+(c/xn-xn))/2
xn+1=xn+((c-xn2)/xn)/2
xn+1=xn+(c-xn2)/(2xn)
This means we are trading half the iterations of the division for a multiplication at the previous precision. Multiplication tends to be faster on the Z80, and especially so on the eZ80.
So, if anybody wants to implement high-precision floating point square roots on the eZ80, this is the way to go. Currently for the Z80, I have the core routine working on a 31-bit mantissa with an average case of approximately 3371cc. I project the complete extended-precision routine will be under 14000cc, compared to my previous best of 28000cc, and TI's 86000cc. This also compares to 11000cc for multiplication and 19000cc for division.
I won't get too much into the mathematical aspects of why this works or proving quadratic convergence, but Newton's Method for square roots goes like this:
Given a non-zero starting guess to the square root of c, x0, then the sequence x where xn+1=(xn+c/xn)/2 converges to √c.
That's all fine and it even doubles the digits of accuracy every iteration, so it is fast. What it is basically doing is averaging two guesses, xn, and c/xn. But if it is doubling the digits of accuracy, then say xn has 7 digits accuracy, then those first digits will match for xn+1, which means they'll match for c/xn. Hypothetically, this means you can skip computing the first 7 digits of c/xn. This gives us:
xn+1=(xn+xn+(c/xn-xn))/2
xn+1=xn+((c-xn2)/xn)/2
xn+1=xn+(c-xn2)/(2xn)
This means we are trading half the iterations of the division for a multiplication at the previous precision. Multiplication tends to be faster on the Z80, and especially so on the eZ80.
So, if anybody wants to implement high-precision floating point square roots on the eZ80, this is the way to go. Currently for the Z80, I have the core routine working on a 31-bit mantissa with an average case of approximately 3371cc. I project the complete extended-precision routine will be under 14000cc, compared to my previous best of 28000cc, and TI's 86000cc. This also compares to 11000cc for multiplication and 19000cc for division.