A few days ago, I came up with a routine to approximate sine. The short, math version is:
-I converted the 8-bit register digits to a degree 7 polynomial where x=2 returns the correct value.
-I then made another polynomial where the coefficients where 1-ai.
-Then I multiplied these, basically performing z*~z
-Using these as fixed point numbers on [0,1), this is basically z(255/256-z)
-Scaling, this gives a nice approximation of sine.
-For speed purposes, I discarded the lower lower 8 terms, keeping the upper 8.


That was cool and all, but last night I was working on it some more when I realised that if I had taken just the lower terms, I wouldn't have propagation issues to worry about, and it would return -x2-x mod 256.

However, the algorithm I was using, if I initialized the accumulator to x instead of 0, I would get -x2. Naturally, I took advantage of this:

Code:

L_sqrd:
;Input: L
;Output: L*L->A
;159 t-states
;39 bytes
   ld h,l
   ld c,l
   rr h
   sbc a,a
   xor l
   add a,c
   ld c,a
   rl l

   rr h
   sbc a,a
   xor l
   and %11111000
   add a,c
   ld c,a
   rl l

   rr h
   sbc a,a
   xor l
   and %11100000
   add a,c
   ld c,a
   rl l

   ld a,h
   rrca
   xor l   
   and 128
   xor c
   neg
   ret

I am sure it can be optimized further, but it is already 159 t-states. It is only 29 cycles faster than my fastest general 8-bit multiplication routine, but can you imagine this as a circuit? It's kind of pretty cool, I think.

EDIT: I thought of a way to optimize the first iteration. Saved 2 bytes, 8 t-states.
Basically, at the first iteration, C=-1 or C=2L, then I shift L up one bit for the next iteration. I got rid of the initial ld c,a and use the first iteration to compute c. To do this, I just shift L at the beginning, then after the "sbc a,a" I just OR that with L. If a was $FF, the result is FF (which is -1), else it is 2*L:

Code:

L_sqrd:
;Input: L
;Output: L*L->A
;151 t-states
;37 bytes
   ld h,l
;First iteration, get the lowest 3 bits
   sla l
   rrh
   sbc a,a
   or l
;second iteration, get the next 2 bits
   ld c,a
   rr h
   sbc a,a
   xor l
   and $F8
   add a,c
;third iteration, get the next 2 bits
   ld c,a
   sla l
   rr h
   sbc a,a
   xor l
   and $E0
   add a,c
;fourth iteration, get the last bit
   ld c,a
   ld a,l
   add a,a
   rrc h
   xor h
   and $80
   xor c
   neg
   ret

Also, as a note, if you stop at any iteration and perform NEG, you can mask to get the lower bits of the square. So for example, the first iteration gives you -L*L mod 8->A, the second returns -L*L mod 32->A, the third gives it mod 128.
Programming in it's purest form...very neat
.
Yes - I have considered the logic circuits - but only briefly...then I realsed that Alan Turing & John Von-Neumann occupied a different planet to us. Smile
Thanks, I definitely like programming at this even lower, "fundamental" level. I am sure it goes deeper Smile

Also, I thought of an optimization for the first iteration in class, so I edited my first post (at the bottom).
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement