That's what i had to do, it's just strange since that branching in an infinite loop doesn't do anything with the stack and the code i added doesn't do a thing as it never gets run. I'm going to try writing some other simple C routines/their asm equivalents and see if i can figure out what's different.
I've got basic programs running now, it seems i needed to add a ".section .rodata" before the data, i'm not sure why but removing that makes it crash (removing it also makes the size of the program slightly smaller, does anyone know what it does?). Now i'm trying to get drawing to the screen working, Kerm mentioned something about DMA or Direct Memory Access, but i'm not quite sure how to go about that.

Code:
.global   _main
   .type   _main, @function
_main:
   mov.l   r8,@-r15      ! r15 used as stack?
   mov   #0,r1
   mov.l   r9,@-r15
   mov   #1,r4
   sts.l   pr,@-r15      ! procedure return: stores return address for existing subroutines
   mov   #1,r5
   add   #-8,r15            ! move down 8 bytes on the stack
   mov.l   r1,@r15         ! coords onto stack ?
   mov.l   _sysPrintXY,r1   ! _PrintXY in r1
   mov.l   _sysGetKey,r9   ! storing syscalls to jump to
   mov.l   _sysBdisp_PutDisp_DD,r8
   mov.l   helloWorldPtr,r6
   jsr   @r1               ! call (r1), r1 = ptr to syscall (_PrintXY)
   mov #1,r7            ! waste space
   mov   r15,r4
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! DRAWING CODE SHOULD START HERE
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   mov #0x84,r5
   mov.l VRAM_ptr,r1
   mov #153,r2
   .align 2
drawLoop:
   mov.w r5,@r1
   add   #2,r1
   dt r2
   bf   drawLoop         ! branch false (jr nz)
mainLoop:
   jsr   @r8               ! draw gbuf
   nop                  ! waste space
   jsr   @r9               ! call (r9) = _GetKey
   nop                  ! waste space
   bra   mainLoop
   nop                  ! waste space

   .align 4
VRAM_ptr:
   .long 0xA8000300+0x1000

! Start of data
!   .align 2
_sysPrintXY:
   .long   _PrintXY
_sysGetKey:
   .long   _GetKey
_sysBdisp_PutDisp_DD:
   .long   _Bdisp_PutDisp_DD
helloWorldPtr:
   .long   helloWorld

! i don't know why, but it seems to need ".section .rodata"
   .section .rodata
   .align 2
helloWorld:
   .string   "XXAssembly is possible"

I tried disassembling the source to Kerm's CopySprite routine and it just seems to load it directly into VRAM. Also, the source seems to do lots of funny things (part of "*(VRAM++) = *(data++)"):

Code:
...
  94 0086 61F3           mov   r15,r1
  95 0088 71E0           add   #-32,r1
  96 008a 62F3           mov   r15,r2
  97 008c 72E0           add   #-32,r2
  98 008e 522E           mov.l   @(56,r2),r2
  99 0090 7202           add   #2,r2
 100 0092 112E           mov.l   r2,@(56,r1)
 101 0094 61F3           mov   r15,r1
 102 0096 71E0           add   #-32,r1
 103 0098 62F3           mov   r15,r2
 104 009a 72E0           add   #-32,r2
 105 009c 522F           mov.l   @(60,r2),r2
 106 009e 7202           add   #2,r2
 107 00a0 112F           mov.l   r2,@(60,r1)
For one, i think here you don't even need to use two registers (couldn't you just do mov.l @(56,r1),r2?), and two, you're getting the same offset anyway in each one so why couldn't you just "mov r1,r2"? Three, you don't change the value of r1 but it gets recalculated and set to exactly the same value as before Razz

Another thing i thought was funny:

Code:
 10052 013e 6323           mov   r2,r3
 10053 0140 6213           mov   r1,r2
 10054 0142 D112           mov.l   .L14,r1 ! this is a pointer to a subroutine
 10055 0144 6433           mov   r3,r4
 10056 0146 6523           mov   r2,r5
 10057 0148 6613           mov   r1,r6
So... you load register 1 and 2 into 2 and 3 respectively, then reload 2 and 3 into 5 and 4 (respectively) two instructions later? Razz Is this disassembly the final code that gets sent to the calculator or is this the first output that later gets optimized? Unless of course you need the same value on both registers...
The easiest way to DMA the VRAM to the LCD would be to use one of the existing system calls, namely Bdisp_PutDisp_DD. Yes, my routine does copy to VRAM, but once all of the VRAM is set up for a frame, I ship it all to the LCD with the aforementioned system call. And I'm amused by your consternation over the black magic that is the code from modern optimizing compilers.
The thing is i've got a call to Bdisp_PutDisp_DD in my code but it doesn't seem to do anything:

Code:
   mov.l   _sysBdisp_PutDisp_DD,r8
...
mainLoop:
   jsr   @r8               ! draw gbuf
   nop                  ! waste space
...
_sysBdisp_PutDisp_DD:
   .long   _Bdisp_PutDisp_DD
The _PrintXY call works fine, though. Maybe i don't actually have VRAM correctly loaded into the register and i'm storing the values somewhere else :/
I just realized why my code wasn't getting drawn to the LCD: _GetKey appears to redraw the status bar (with the battery meter) and that was covering all the pixels i'd just turned on! So now i know that i'm not crazy and that my code did indeed work, it was just being worked against by the OS Wink

Btw, if anyone's interested here are some random notes and things i've been jotting down in a text file as i look things up in the manual:

Code:
banked registers bank0 & 1, like z80 shadow registers?
ldc / stc and -s versions
PR = stores return address for existing subroutines
   like the stack in the z80, when you call a subroutine the address of the following instruction
   (really, the second instruction due to delayed branching) gets stored into PR

DSR register bits:
7 signed greater than (GT): 1 = operation positive or operand1 is larger
6 zero bit (Z): 1 = like z80 z flag
5 negative bit (N): 1 = negative or oper1 < op 2
4 overflow bit (V): 1 = operation result overflowed
3-1 condition select bits (CS): specifies mode for selecting status of result set in DC bit. do not specify 110 or 111
   000: carry/borrow mode
   001: negative value mode
   010: zero value mode
   011: overflow mode
   100: signed greater than mode
   101: signed equal or greater than mode
0 DSP condition bit (DC) sets the operation result status in mode specified by CS bits
   0: specified mode status not achieved
   1: achieved

register operands are always longwords (32 bits, 4 bytes). When an operand is only a byte or a word, it is sign-extended
word operands must be accessed from word boundaries (even addresses)
longword operands must be accessed from longword boundaries (even addresses four bytes apart)
other accesses will cause address errors
big endian or little endian: MD5 low = big endian, high = little

data format for immediate data:
MOV, ADD, and CMP/EQ: sign-extended
TST, AND, OR, and XOR: zero-extended

Multiplication doesn't store the result into one of the registers, rather it gets stored in the MACL. So mul.l r1,r2 stores the result into MACL (store the result with sts: sts MACL,r1) NOT into r1.

Since instructions are only two bytes, you can only load 1 byte immediate values into registers. Larger values need to be stored and loaded from data. Also, be careful about sign extending. 255 is really treated as -1 and sign-extended.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 3 of 3
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement