Overview:

I've run into a speed problem with the HD Picture Viewer. Decompressing and displaying sprite data is too slow for GIFs! If you'd like to win up to $60 worth of gift cards of your choice, here's your chance!

There are three optimization challenges. Each challenge has its own reward of $20!
This competition will end on February 27th at 11:59pm (cst).
Submissions should be emailed to thelastmillennial@gmail.com.
Please share your scores in this thread!

Provided Files:
This is a zip of all the files you should need to complete this challenge (This does not include the software to assemble the ez80).
https://1drv.ms/u/c/b27ced2546bad95f/IQCq_TnPFqtFRp4lZ1JzhlfIAduj5neyrtSEJk3x37FLQcM?e=9qKGmv

The zip contains:
  • the appvars for 3 benchmark gifs
  • the C toolchain for building your code
  • the HD Picture Viewer source code
  • the 16bpp graphx library
  • the clibs for good measure.
Provided benchmark GIFs previews: https://imgur.com/a/p1kvZnT
  1. Bench1: Random noise (A synthetic worst-case scenario)
  2. Bench2: Rick (A realistic worst-case scenario)
  3. Bench3: Nyan Cat (A realistic best-case scenario)


Scoring:

There are three provided benchmark GIFs. A modified version of HD Picture Viewer will play a GIF and provide the average milliseconds it took to run each function. (When HD Picture Viewer launches, press [enter] to open a GIF to test)

Score calculation:

Each challenge is scored separately. Every millisecond faster your optimized code is over the provided baselines will averaged across each benchmark GIF.

Example: If the baseline averages are:
  • 50 ms for bench1
  • 40 ms for bench2
  • 30 ms for bench3

...and your code ends up with the averages:
  • 47 ms for bench1
  • 33 ms for bench2
  • 21 ms for bench3

Then you would receive: ( (50 - 47) + (40 - 33) + (30 - 21) ) / 3 = 6.33 points.

Prizes:

The person with the most points for a challenge will win that challenge. The winner of each challenge will receive a $20 gift card of their choice. The same person can win multiple challenges.
  • All submitted entries MUST beat the baseline average by at least 5%.. Entries for challenges 1 & 2 should be optimized by at least 5%. Challenge 3 has no optimization limit. Even a 1% improvement to challenge 3 will be accepted.
  • Your final score will be based on how well it performs on two physical calculators. These calculators are hardware revision C and revision M. Your scores will be averaged across the two calculators.
  • Check the Overview section for the end date!


Challenge 1:

Optimize or the function hdl_ScaledTransSpriteFullscreen in the hdlib.asm file. This file is located at: /hdpic-competition-files/toolchain/src/hdlib/hdlib.asm Your changes can be built using the makefile provided in the same directory. Once your changes have been built, you can just send hdlib.8xv to your calculator. No need to re-build HD Picture Viewer.

Parameters:

  • The argument should remain the same
  • No clipping
  • Input sprite is 8bpp
  • Input sprite is 160x120 pixels (width x height)
  • Input sprite transparency color is at index 2 (You may adjust the index value)
  • Output sprite must be 320x240 and draw the sprite fullscreen
  • Preferably coded in eZ80.


Baseline averages:

  • Bench1: 42.38
  • Bench2: 42.84
  • Bench3: 34.20


tldr: make this code fast

Code:
;WARNING: Vibe code ahead.
hdl_ScaledTransSpriteFullscreen:
; Draws a 160x120 sprite scaled 2x with transparency
; Hardcoded: X=0, Y=0, Scale=2x
; Optimized: Dual-row writing, no EXX, safe for stack.

    push ix
    push iy                 ; Preserve pointers
   
    ; --- Setup Source and Destination ---
    ; Arg0 (Sprite Pointer) is at SP + 9 (3 for IX, 3 for IY, 3 for Ret)
    ld   hl, 9
    add  hl, sp
    ld   hl, (hl)           ; HL = Sprite structure
    inc  hl                 ; Skip width byte
    inc  hl                 ; Skip height byte
   
    ld   de, (CurrentBuffer) ; DE = Start of Row 1
    ld   bc, 320
    push de
    pop  iy
    add  iy, bc             ; IY = Start of Row 2
   
    ; We use IXL as our outer row counter (120 input rows)
    ld   a, 120
    ld   ixl, a

NcTransRowLoop:
    ld   b, 160             ; Inner loop: 160 input pixels
NcTransPixelLoop:
    ld   a, (hl)            ; 1. Load pixel
    inc  hl
   
    cp   a, TRANSPARENT_COLOR; 2. Check transparency
    jr   z, NcTransSkip     ; If transparent, just move pointers
   
    ; 3. Not Transparent: Write 2x2 block
    ; Write Row 1
    ld   (de), a
    inc  de
    ld   (de), a
    inc  de
   
    ; Write Row 2
    ld   (iy + 0), a
    inc  iy
    ld   (iy + 0), a
    inc  iy
   
    djnz NcTransPixelLoop
    jr   NcTransRowAdvance

NcTransSkip:
    ; 4. Transparent: Skip 2 pixels on both rows
    inc  de
    inc  de
    inc  iy
    inc  iy
    djnz NcTransPixelLoop

NcTransRowAdvance:
    ; After 160 pixels (320 output pixels), pointers are:
    ; DE = Start of Row 2, IY = Start of Row 3.
    ; We need to move them to start Row 3 and Row 4.
   
    push iy
    pop  de                 ; DE = Start of Row 3
    ld   bc, 320
    add  iy, bc             ; IY = Start of Row 4
   
    dec  ixl
    jr   nz, NcTransRowLoop

    pop  iy
    pop  ix
    ret


Challenge 2:

Optimize the function hdl_ScaledTransSpriteFullscreen_ColMajor in the hdlib.asm file. This file is located at: /hdpic-competition-files/toolchain/src/hdlib/hdlib.asm Your changes can be built using the makefile provided in the same directory. Once your changes have been built, you can just send hdlib.8xv to your calculator. No need to re-build HD Picture Viewer.

Parameters:

  • The argument should remain the same
  • No clipping
  • Input sprite is 8bpp
  • Input sprite is 160x120 pixels (width x height)
  • Input sprite transparency color is at index 2 (You may adjust the index value)
  • Output sprite must be 320x240 and draw the sprite fullscreen
  • Preferably coded in eZ80
  • Must be drawn column-major to reduce the diagonal screen-tearing


Baseline averages:

  • Bench1: 56.16
  • Bench2: 57.13
  • Bench3: 39.09


tldr: make this code fast

Code:
;WARNING: vibe code ahead
hdl_ScaledTransSpriteFullscreen_ColMajor:
; Draws a 160x120 sprite scaled 2x
; Input: Sprite Data is ROW-MAJOR (Standard)
; Output: Draws Vertically (Column-Major)
; Optimization: Uses IX for Sprite Stride, HL for Screen Stride.

    push ix
    push iy
   
    ; --- Setup ---
    ld   hl, 9
    add  hl, sp
    ld   hl, (hl)           ; HL = Sprite Data Start
    inc  hl
    inc  hl                 ; Skip width/height
    push hl
    pop  ix                 ; IX = Sprite Pointer (Source)
   
    ld   hl, (CurrentBuffer) ; HL = Screen Pointer (Dest)
   
    ; Define Constants for Stride
    ; BC = Sprite Stride (160 bytes to get to next Y pixel)
    ; DE = Screen Stride (640 bytes to get to next Y block)
    ld   bc, 160
    ld   de, 640
   
    ; Outer Loop: 160 Columns
    ld   a, 160
    ld   iyh, a             ; IYH = Column Counter

.ColLoop:
    push ix                 ; Save Top of Sprite Column
    push hl                 ; Save Top of Screen Column
   
    ; Inner Loop: 120 Rows
    ld   a, 120
    ld   iyl, a             ; IYL = Row Counter

.RowLoop:
    ld   a, (ix+0)          ; Load Pixel from Sprite
   
    ; Advance Sprite Pointer Vertically (Row +1)
    ; We do this now because IX is not needed for the rest of this iter
    add  ix, bc             ; IX += 160
   
    cp   a, TRANSPARENT_COLOR               ; Transparent?
    jr   z, .Skip
   
    ; --- Draw 2x2 Block ---
    ; 1. Draw Top Pair
    ld   (hl), a            ; Write X
    inc  hl
    ld   (hl), a            ; Write X+1
   
    ; Move HL to Next Line (Down 1 line, Back 1 pixel)
    ; Offset: +320 - 1 = +319
    push bc                 ; Save Sprite Stride (160)
    ld   bc, 319
    add  hl, bc
   
    ; 2. Draw Bottom Pair
    ld   (hl), a            ; Write X
    inc  hl
    ld   (hl), a            ; Write X+1
   
    ; Align HL for next loop (Down 1 line, Back 1 pixel)
    ; Offset: +319
    add  hl, bc
    pop  bc                 ; Restore Sprite Stride (160)
   
    dec  iyl
    jr   nz, .RowLoop
    jr   .NextCol

.Skip:
    ; Transparent: Jump Screen Pointer down 2 lines (640 bytes)
    add  hl, de             ; HL += 640
   
    dec  iyl
    jr   nz, .RowLoop

.NextCol:
    pop  hl                 ; Restore Screen Top
    inc  hl
    inc  hl                 ; Move Screen Right 2 Pixels
   
    pop  ix                 ; Restore Sprite Top
    inc  ix                 ; Move Sprite Right 1 Pixel
   
    dec  iyh
    jr   nz, .ColLoop

    pop  iy
    pop  ix
    ret


Challenge 3:

Optimize the zx0 decompression function in the zx0.src file. This file is located at /hdpic-competition-files/toolchain/src/ce/zx0.src Your changes can be build using the makefile provided at /hdpic-competition-files/toolchain/makefile You must re-build HD Picture Viewer for the changes to take effect.

Note: All provided benchmark files are zx0 compressed.

Parameters:

  • The function arguments should remain the same.
  • Must be coded in eZ80 and builds with the toolchain.


Baseline averages:

  • Bench1: 15.90
  • Bench2: 36.26
  • Bench3: 9.52


tldr: make this code faster

Code:
;Note: Real code ahead
   .assume   adl=1

   .section   .text
   .global   _zx0_Decompress
   .type   _zx0_Decompress, @function

_zx0_Decompress:
   pop   bc
   pop   de
   ex   (sp), hl
   push   de
   push   bc

; -----------------------------------------------------------------------------
; ZX0 decoder by Einar Saukas & introspec
; "Turbo" version
; -----------------------------------------------------------------------------

dzx0_turbo:
   ld   iy, -1      ; preserve default offset 1
   lea   bc, iy + 2   ; ld bc, 1
   scf
   jr   dzx0t_start

dzx0t_new_offset:
   dec   bc
   dec   bc      ; prepare negative offset
   add   a, a
   jr   nz, dzx0t_new_offset_skip
   ld   a, (hl)      ; load another group of 8 bits
   inc   hl
   rla
dzx0t_new_offset_skip:
   call   nc, dzx0t_elias   ; obtain offset MSB
   inc   c
   ret   z      ; check end marker
   ld   b, (hl)      ; obtain offset LSB
   inc   hl
   rr   c      ; last offset bit becomes first length bit
   rr   b
   ld   iyl, b      ; preserve new offset
   ld   iyh, c
   ld   bc, 1      ; obtain length
   call   nc, dzx0t_elias
   inc   bc
dzx0t_copy:
   push   hl      ; preserve source
; dzx0t_last_offset:
   lea   hl, iy + 0      ; restore offset
   add   hl, de      ; calculate destination - offset
   ldir         ; copy from offset
   pop   hl      ; restore source
   add   a, a      ; copy from literals or new offset?
   jr   c, dzx0t_new_offset
dzx0t_literals:
   inc   c      ; obtain length
   add   a, a
   jr   nz, dzx0t_literals_skip
dzx0t_start:
   ld   a, (hl)      ; load another group of 8 bits
   inc   hl
   rla
dzx0t_literals_skip:
   call   nc, dzx0t_elias
   ldir         ; copy literals
   add   a, a      ; copy from last offset or new offset?
   jr   c, dzx0t_new_offset
   inc   c      ; obtain length
   add   a, a
   jr   nz, dzx0t_last_offset_skip
   ld   a, (hl)      ; load another group of 8 bits
   inc   hl
   rla
dzx0t_last_offset_skip:
   call   nc, dzx0t_elias
   jr   dzx0t_copy

dzx0t_elias:
   add   a, a      ; interlaced Elias gamma coding
   rl   c
   add   a, a
   jr   nc, dzx0t_elias
   ret   nz
   ld   a, (hl)      ; load another group of 8 bits
   inc   hl
   rla
   ret   c
   add   a, a
   rl   c
   add   a, a
   ret   c
   add   a, a
   rl   c
   add   a, a
   ret   c
   add   a, a
   rl   c
   add   a, a
   ret   c
dzx0t_elias_loop:
   add   a, a
   rl   c
   rl   b
   add   a, a
   jr   nc, dzx0t_elias_loop
   ret   nz
   ld   a, (hl)       ; load another group of 8 bits
   inc   hl
   rla
   jr   nc, dzx0t_elias_loop
   ret


Other Rules:
    For all challenges, you may either optimize existing code or rewrite the code from scratch. You can use any tools you'd like to assist with your coding. As long as your submission falls within the parameters and runs on my calculators, it will be accepted.

    You may modify HD Picture Viewer as you see fit for testing. The DrawGIF() function is in main.cpp:413. However, your final score will be based on the original HD Picture Viewer code.

    If you find a very compelling reason to adjust the arguments of a function, email me and this will be handled on a case-by-case bases.

Email me with any questions or concerns. Best of luck!
The code is about as optimized as it can be.
It appears this competition is less doable than I anticipated. I've decided to relax the rules a bit.

Changes:
The image format is allowed to be changed. Just keep me in the loop with what you're changing and I'll be happy to help.

Challenge 3 has no bare minimum optimization requirement. Unlike challenge 1 and 2, challenge 3's code was written by Actual Intelligence so I'll accept any improvements.
I think you should give the winner of this contest to calc84maniac, he has implemented an optimized LZ4/LZ4HC decompression routine in the toolchain that should be faster in most cases than either zx0 or zx7. You'll want to use "lz4hc" most likely in convimg (or "lz4").

Can find it here using the latest toolchain nightly: https://github.com/CE-Programming/toolchain/releases/tag/nightly

Also here's a slightly optimized version of your scaling routine by me, zerico, and calc84:


Code:
hdl_ScaledTransSpriteFullscreen:
    pop    de
    ex    (sp), iy
    push    de
    push    ix
    ld    hl, (CurrentBuffer)
    ld    c, TRANSPARENT_COLOR
    ld    ixl, 120
.L.NcTransRowLoop:
    ex    de, hl
    ld    hl, 320
    add    hl, de
    ld    b, 160
.L.NcTransPixelLoop:
    ld    a, (iy + 2)
    inc    iy
    cp    a, c
    jr    z, .L.NcTransSkip
    ld    (de), a
    inc    de
    ld    (de), a
    inc    de
    ld    (hl), a
    inc    hl
    ld    (hl), a
    inc    hl
    djnz    .L.NcTransPixelLoop
    dec    ixl
    jr    nz, .L.NcTransRowLoop
    pop    ix
    ret
.L.NcTransSkip:
    inc    de
    inc    de
    inc    hl
    inc    hl
    djnz    .L.NcTransPixelLoop
    dec    ixl
    jr    nz, .L.NcTransRowLoop
    pop    ix
    ret
My sincere apologies for the delay in wrapping this up!

calc84maniac's lz4 compression made the biggest performance difference getting full screen GIFs consistenly over 20fps! I'm happy to hand challenge 3 to them!

Since Mateo, Zerico, and calc84 had the only submission, they win challenge 1 by default. I'm very pleased to see what people come up with!

I will reach out to calc84 for their prize and I will reach out to Mateo and let him distribute the prize among his team.

Thank you all!
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement