- LZ4 Decompression
- 28 Mar 2015 03:44:49 am
- Last edited by Unknownloner on 31 May 2015 04:50:53 pm; edited 2 times in total
EDIT 2015-05-31:
The lz4 flag -Sx has been replaced by --no-frame-crc in recent versions. This post has been updated to reflect that change.
If you are still using an older version of lz4, use -Sx instead of --no-frame-crc.
This code can decompress data compressed with LZ4.
For example, I was able to compress by assembly 2048 game from 4923 bytes to 3754 bytes (the 3754 bytes includes the decompression code). Most of that is sprite data being compressed of course, but it was also able to compress some of the actual code. Likely it was compressing LCD control code, since that can be a lot of the same instructions over and over. Decompression speed is fantastic, as this algorithm was designed for speed. The majority of the program is now getting decompressed at start and it still starts up quickly.
EDIT: Another interesting tidbit, calcuzap compressed from around 14,000 bytes to just under 5000 when I compress the .8xp. Obviously that's not executable, but it'd be possible to to make a program to wrap it. You could allocate memory with the OS like normal or just copy the output code to another ram page while maintaining addresses and just hope the program doesn't use that ram page for anything .
Once you have lz4 installed, you can compress your data using something like this:
Code:
Then include the resulting binary file in your program.
The --no-frame-crc flag tells it not to output stream checksums, as the decoder can't handle them. You must use this flag.
The -9 flag is to improve compression ratios slightly.
Not all data is compressible of course. LZ4 will have worse ratios than gzip, but it's primary advantage is that it's very fast and simple to decode.
This code has absolutely no error handling (handling of malformed/corrupt data). It assumes that your data is not corrupt. I could add error handling but I'm not really sure what I'd do to actually handle the error... If you'd like error handling feel free to modify the code for that. The code is also unoptimized so that it can be easier to understand, feel free to optimize it!
I think that this can support compressing up to 64K of data, but I'm not entirely sure. Realistically, if you can fit both the input and output data into the 64k address space of the z80 you're probably fine.
The code currently targets the z80 with Brass. I'll update this for ez80 once I learn it & have the time (unless someone else does it first). It should only take a few minor edits to support compressing data larger than 64K.
Also, I haven't really commented it that well, but I may go in later and comment a bit better. If you also read through these two documents, you may be able to get an understanding of it.
Please pardon my writing, as it's probably as detailed as I normally write for these sorts of things. I'm typing this at 5 AM, so I'm just trying to get this out there in a form that makes some sort of sense.
Code:
The lz4 flag -Sx has been replaced by --no-frame-crc in recent versions. This post has been updated to reflect that change.
If you are still using an older version of lz4, use -Sx instead of --no-frame-crc.
This code can decompress data compressed with LZ4.
For example, I was able to compress by assembly 2048 game from 4923 bytes to 3754 bytes (the 3754 bytes includes the decompression code). Most of that is sprite data being compressed of course, but it was also able to compress some of the actual code. Likely it was compressing LCD control code, since that can be a lot of the same instructions over and over. Decompression speed is fantastic, as this algorithm was designed for speed. The majority of the program is now getting decompressed at start and it still starts up quickly.
EDIT: Another interesting tidbit, calcuzap compressed from around 14,000 bytes to just under 5000 when I compress the .8xp. Obviously that's not executable, but it'd be possible to to make a program to wrap it. You could allocate memory with the OS like normal or just copy the output code to another ram page while maintaining addresses and just hope the program doesn't use that ram page for anything .
Once you have lz4 installed, you can compress your data using something like this:
Code:
lz4 -9 --no-frame-crc <sourcefile> <outfile>
Then include the resulting binary file in your program.
The --no-frame-crc flag tells it not to output stream checksums, as the decoder can't handle them. You must use this flag.
The -9 flag is to improve compression ratios slightly.
Not all data is compressible of course. LZ4 will have worse ratios than gzip, but it's primary advantage is that it's very fast and simple to decode.
This code has absolutely no error handling (handling of malformed/corrupt data). It assumes that your data is not corrupt. I could add error handling but I'm not really sure what I'd do to actually handle the error... If you'd like error handling feel free to modify the code for that. The code is also unoptimized so that it can be easier to understand, feel free to optimize it!
I think that this can support compressing up to 64K of data, but I'm not entirely sure. Realistically, if you can fit both the input and output data into the 64k address space of the z80 you're probably fine.
The code currently targets the z80 with Brass. I'll update this for ez80 once I learn it & have the time (unless someone else does it first). It should only take a few minor edits to support compressing data larger than 64K.
Also, I haven't really commented it that well, but I may go in later and comment a bit better. If you also read through these two documents, you may be able to get an understanding of it.
* https://docs.google.com/document/d/1cl8N1bmkTdIpPLtnlzbBSFAdUeyNo5fwfHbHU7VRNWY/edit
* https://github.com/Cyan4973/lz4/blob/master/lz4_Block_format.md
Please pardon my writing, as it's probably as detailed as I normally write for these sorts of things. I'm typing this at 5 AM, so I'm just trying to get this out there in a form that makes some sort of sense.
Code:
.module LZ4
;HL - Input buffer
;DE - Output buffer
DecompressLZ4Data:
;Skip header data
ld bc,7
add hl,bc
_decompBlocksLp:
call _DecodeLZ4Block
jr c,_decompBlocksLp
ret
;Decode a block
;Returns with C if more blocks to decode, NC if end of data
_DecodeLZ4Block:
;Block size
ld c,(hl) \ inc hl
ld b,(hl) \ inc hl
;Return if length == 0 (EOF)
ld a,b
or c
ret z
inc hl
ld a,(hl) \ inc hl ;If high bit == 1, uncompressed, else compressed
jr nc,{+}
;Not compressed, do a data copy
ldir
scf
ret
+:
;Compressed, run decompression
call _DecompressLZ4Block
scf
ret
;HL - Input buffer
;DE - Output buffer
;BC - Block length
_DecompressLZ4Block:
push hl
add hl,bc
ex (sp),hl ; Stack = address directly after data end
_decompressLp:
ld a,(hl) \ inc hl ;Sequence token
push af
;===Decompress Literals===
;High 4 bits -> Low 4 bits
rra
rra
rra
rra
call _ReadByteExtensionsIfNeeded ;BC = num literals
;If length is 0, no copying
ld a,b
or c
jr z,{+}
ldir
+:
;===
pop af
;If we've processed the input length, return
pop bc
or a
sbc hl,bc
add hl,bc
ret z
push bc
;===Decompress Matches===
ld c,(hl) \ inc hl
ld b,(hl) \ inc hl
push bc ;Store offset from output
call _ReadByteExtensionsIfNeeded ;BC = match length
;Add 4 because min is 4
inc bc \ inc bc \ inc bc \ inc bc
ex (sp),hl ;HL = offset from output, (SP) = input buffer
push de
ex de,hl ;HL = output, DE = offset
or a
sbc hl,de ;HL = match start
pop de ;DE = output
ldir
pop hl ;HL = input buffer
;===
jr _decompressLp
;If A = 15, read & add byte extensions
;Otherwise, BC = A
_ReadByteExtensionsIfNeeded:
and Fh
ld b,0
ld c,a
cp 15
ret nz
-: ld a,(hl) \ inc hl
cp 255
jr nz,{+}
add a,c
jr nc,$+3
inc b
ld c,a
jr {-}
+:
add a,c
jr nc,$+3
inc b
ld c,a
ret
.endmodule