- 01 Mar 2015 10:10:17 pm
- Last edited by Xeda112358 on 20 Jun 2020 08:33:29 am; edited 3 times in total
This routine is a fast way to copy a chunk of data. For BC>=35, calling this proves to be faster than using LDIR, and I believe it has identical input and output:
Code:
This might be useful for things like copying code to RAM (from an App) in speed critical applications, or other data handling tasks.
EDIT 29 Sept 18: Saved 1 byte and 3cc. Originally, I had 'ld a,16 \ sub c \ and 15'. The 'ld a,16' could hold any multiple of n (16 in this example), including 0, so I just used 'xor a'. I also updated all the timing info.
Code:
fastLDIR:
;copy BC bytes from HL to DE
;Cost:
; 27cc for having to call
; 110cc for setting up the loop, worst case
; 10cc * ceiling(BC/n) ;n=2^k for some k, see the line below "ldirloop:"
; 16cc * BC
;costs roughly 152-BC*(5-10/n) more than a simple LDIR (worst case)
;for n=4, BC>=61 saves
;for n=8, BC>=41 saves
;for n=16, BC>=35 saves * default, see the "ldirloop" to change
;for n=32, BC>=33 saves
;for n=64, BC>=32 saves
push hl
push af
xor a
sub c
and 15 ;change to n-1
add a,a
ld hl,ldirloop
add a,l
ld l,a
jr nc,$+3 ;these aren't needed if the ldirloop doesn't cross a 256 byte boundary. Can save 12cc on the above timings and 3 bytes.
inc h ;
pop af
ex (sp),hl
ret
ldirloop:
;n=16, (number of LDI instructions, use qty of 4,8,16,32,64)
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
_ldirloop_end:
ldi
jp pe,ldirloop
ret
This might be useful for things like copying code to RAM (from an App) in speed critical applications, or other data handling tasks.
EDIT 29 Sept 18: Saved 1 byte and 3cc. Originally, I had 'ld a,16 \ sub c \ and 15'. The 'ld a,16' could hold any multiple of n (16 in this example), including 0, so I just used 'xor a'. I also updated all the timing info.