A few days ago, I came up with a routine to approximate sine. The short, math version is:
-I converted the 8-bit register digits to a degree 7 polynomial where x=2 returns the correct value.
-I then made another polynomial where the coefficients where 1-ai.
-Then I multiplied these, basically performing z*~z
-Using these as fixed point numbers on [0,1), this is basically z(255/256-z)
-Scaling, this gives a nice approximation of sine.
-For speed purposes, I discarded the lower lower 8 terms, keeping the upper 8.

That was cool and all, but last night I was working on it some more when I realised that if I had taken just the lower terms, I wouldn't have propagation issues to worry about, and it would return -x2-x mod 256.

However, the algorithm I was using, if I initialized the accumulator to x instead of 0, I would get -x2. Naturally, I took advantage of this:

Code:
``` L_sqrd: ;Input: L ;Output: L*L->A ;159 t-states ;39 bytes    ld h,l    ld c,l    rr h    sbc a,a    xor l    add a,c    ld c,a    rl l    rr h    sbc a,a    xor l    and %11111000    add a,c    ld c,a    rl l    rr h    sbc a,a    xor l    and %11100000    add a,c    ld c,a    rl l    ld a,h    rrca    xor l       and 128    xor c    neg    ret ```

I am sure it can be optimized further, but it is already 159 t-states. It is only 29 cycles faster than my fastest general 8-bit multiplication routine, but can you imagine this as a circuit? It's kind of pretty cool, I think.

EDIT: I thought of a way to optimize the first iteration. Saved 2 bytes, 8 t-states.
Basically, at the first iteration, C=-1 or C=2L, then I shift L up one bit for the next iteration. I got rid of the initial ld c,a and use the first iteration to compute c. To do this, I just shift L at the beginning, then after the "sbc a,a" I just OR that with L. If a was \$FF, the result is FF (which is -1), else it is 2*L:

Code:
``` L_sqrd: ;Input: L ;Output: L*L->A ;151 t-states ;37 bytes    ld h,l ;First iteration, get the lowest 3 bits    sla l    rrh    sbc a,a    or l ;second iteration, get the next 2 bits    ld c,a    rr h    sbc a,a    xor l    and \$F8    add a,c ;third iteration, get the next 2 bits    ld c,a    sla l    rr h    sbc a,a    xor l    and \$E0    add a,c ;fourth iteration, get the last bit    ld c,a    ld a,l    add a,a    rrc h    xor h    and \$80    xor c    neg    ret ```

Also, as a note, if you stop at any iteration and perform NEG, you can mask to get the lower bits of the square. So for example, the first iteration gives you -L*L mod 8->A, the second returns -L*L mod 32->A, the third gives it mod 128.
Programming in it's purest form...very neat
.
Yes - I have considered the logic circuits - but only briefly...then I realsed that Alan Turing & John Von-Neumann occupied a different planet to us.
Thanks, I definitely like programming at this even lower, "fundamental" level. I am sure it goes deeper

Also, I thought of an optimization for the first iteration in class, so I edited my first post (at the bottom).

Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

»
» All times are UTC - 5 Hours

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum