Hey guys! I'm a professional game developer that used to make a bunch of calculator games for Casio and TI in my teenage years and have just picked back up the hobby for a little bit starting yesterday Smile

The Prizm is my calculator of choice because the of the speed and full color display and being less popular makes for more interesting projects that haven't been done yet! I ordered a unit on Amazon and it comes in the mail tomorrow.

In the mean time, I got the Prizm SDK installed yesterday, started with a naive port of Cinoop (a very simple c based, soundless gameboy emulator), and managed to use Manager Plus to get tetris up and running. Hoping to start a discussion with vets about my upcoming approaches to optimization and what performance bottlenecks I can expect.

Cheers!
Thomas

[/img]
And you did all of this in one day? Wow! I'm interested to see how far this goes. How are you planning to run Game Boy game cartridges on the Prism? Or will you find some other way? If you ever realeass this make sure to stay within copyright law! Razz
Well it's only a few thousand lines of C that already worked on other platforms, just had to hook up the frame buffer and keys once I worked out the Prizm SDK specifics and figured out how to quickly iterate with the emulator.

Tetris is as dead simple as it gets though, making other games work will take many improvements and optimizations over the next few weeks.

Cartridge images (ROM files) of only the games that people already own will be the only strictly legal way to play emulated gameboy games. There are various ways to obtain them, but the law is pretty hazey.
Thanks for the great news! Keep us posted please
Today I worked on a faster screen blit that draws to vram at the current interrupt period of the gameboy. The x resolution is now being doubled and the y resolution is exactly 50% larger (fits perfectly on the screen). I plan on putting a menu at F6, seems like the perfect place for it.

I added an FPS counter (7.3 here in the emulator, 60 is full speed). I have a number of big performance improvements in mind but want to wait to see how things are on the actual calculator coming in the mail tomorrow Smile

tswilliamson,

This is amazing in every way. I am very excited to see this on a real Prizm. Having Gameboy games for the Casio Prizm instantly unlocks thousands of high quality games that can now be played on the Prizm.

Also do you use Github?
Amazing

I would suggest centering gameboy display so that with the resolution you chose you will end with 60 pixels free on each side above F1 and F6

Maybe options to have not just 2x1.5 scaling but also 1.5x1.5 and 1x1 to avoid distortions
Wow! That's pretty exciting, I'll definitely keep an eye on that. Very Happy

I'm curious to see how it reacts on a real calculator.
*adds a Prizm to his wishlist

Dang. I should really stop looking at projects on other calculators, but this is AMAZING!
Very impressive. Glad to see some more development for the PRIZM Smile
Hey guys,

Here's a very early version of the emulator. I've made so many changes to Cinoob at this point that I'll probably have to give this a new name. I'll probably remove the download in a few days as I improve further.

It currently will only run tetris. You must put your backup ROM file for tetris in the storage root as "tetris.gb". It should be exactly 32 kB. The game plays at ~70% speed (41 FPS) with a frame skip of 1. Hoping to get this to 90% or so as the improvements progress.

Keys:
SHIFT - A button
ALPHA - B button
OPTN - Select button
VARS - Start button
Arrow keys - DPAD buttons
F1 - Misc timers and FPS view (mostly for me, but still neat to look at)

Technical details:
- Display is being done with no syscalls *or* DMA. Instead I am writing the scan lines to the LCD controller directly. This would only really work in an emulator setting though.
- Cycle counts for perf monitoring are done by using TMU2 at the highest clock frequency
- Currently I am overclocking to 85 Mhz or thereabouts, hope to have this as an option though as people won't need it for RPG's, etc
- Code will be shared soon! Probably by the end of the month once it's not in so much flux.

Download here: REMOVED
Very, very nice. This is the project I never got to do on the Prizm and still dream about sometimes. But I moved on to other platforms and never felt enough motivation to come back to the Prizm, especially when I barely touch mine anymore. I'm glad someone is picking up the ashes with a fresh and exciting Prizm project. I wish you all the luck, and I'll try to help as much as I can.

Note that overclocking on the emulator will not, as far as I could ever check, do much about the CPU performance. It just seems to mess up the RTC speed.
On real hardware, I'd avoid doing any overclocking at all, and leaving that to dedicated utilities like Ptune2. This will at least reduce, hopefully, the chance of another "Prizm bricks season". This has been discussed extensively at Cemetech and other communities, for example when speculating about what could be bricking so many Prizms. Some guidelines are on the wiki: http://prizm.cemetech.net/index.php/Addin_Usability_Guidelines#Clock_control

Having a single program for overclocking ensures that when it comes the time to do the aftermath on yet another Prizm, it's easier to understand if that Prizm had been overclocked or not (remember: some games would overclock silently, users who dismissed the documentation would not even know if their calcs had ever been overclocked - very ugly).
Furthermore, having everyone overclock through the same code base makes it easier to identify and fix bugs (remember: there are now multiple ways to overclock floating around, from small variations on the same few assembly lines to completely different approaches like Ptune2. "sure, you overclocked, but how exactly?").

Even when running at 58 MHz, I think the SH7305 has more than enough speed to emulate a Gameboy at full speed and still have cycles to spare. If it isn't the case already, I think the bottleneck here will always be updating the screen. Earlier games (including some of those overclocking misbehaving citizens I mentioned...) have used smaller LCD window sizes to speed up screen updating. There have also been some successful experiments by ProgrammerNerd on how to use non-blocking DMA. Between one thing and the other, plus the fact that sound doesn't need to be emulated, I'm confident that the first Gameboy can be emulated at full speed without frameskipping nor overclocking.
Thanks for the advice! I definitely intend to keep overclocking as simply an option, though it's hardcoded right now. That being said, emulation is a bit more CPU intensive than some people think. The gameboy renders the screen at 60 Hz, and even if you don't draw the screen, you still have to issue all of the correct hblank and vblank interrupts and maintain timing of a 4.1 MHz processor.

Something you may find exciting is I've found a way to circumvent the VRAM entirely, which was the core bottleneck of the display code. I believe the VRAM runs at a much slower clockrate than the rest of the components. This is done by rendering the contents of a few scanlines at a time to the onchip 'X' memory, which unlike regular RAM has access from DMA0 just like the VRAM!

With an overclocked to 87 Mhz CPU, and *without* stretching the screen I'm now running Tetris at 60 Hz with 0 frame skip and refreshing the screen faster than anyone else has seemed to before. Stretching the screen while transferring a similar approach gets me to about 55 Hz with a single frame skip, 47 without.

60 Hz without frameskip probably won't be possible without overclocking until I rewrite the core cpu instruction interpreter in assembly.

I'm comfortable enough with performance now to move on to compatibility and to add a few features so I can put together something "useful" for everyone else and share the codebase, which I've learned a lot while making.
Here's a new version with the aforementioned features. I've renamed it to Prizoop officially and made a dandy icon.

You can still only play Tetris, and it has a scoring bug, but the game will hover around 60 FPS with 0 skipping if you set it to overclock (just to the 80 Mhz setting), not fit the screen and frameskip to 0. The default settings of not overclocking, a skip of 2, and stretching the screen are pretty nice as is though (~85% speed) and will save on battery life.

https://www.dropbox.com/s/1tthn8j0g5lh2p8/prizoop.g3a?dl=0
I would advice against even having overclocking as an option for the aforementioned reasons, but oh well, I'm not the app approver at the Apple app store. I'm not sure you realize, but judging by the clock speeds you mention, you're overclocking using the "traditional" method that messes with one clock alone, which not only severely limits overclocking options but is incompatible with a handful of calculators as well. The way I see it, it's a bit like running an engine in constant overdrive, when there are ways to get the same speed without so much "wear". Doing things properly would require implementing half of Ptune2 to safely mess with more clocks, and perhaps give people an option to choose between multiple options/presets, at which point I wonder if it just wouldn't be best to use Ptune2 directly.

tswilliamson wrote:
Something you may find exciting is I've found a way to circumvent the VRAM entirely, which was the core bottleneck of the display code. I believe the VRAM runs at a much slower clockrate than the rest of the components. This is done by rendering the contents of a few scanlines at a time to the onchip 'X' memory, which unlike regular RAM has access from DMA0 just like the VRAM!

I think everyone until now has always thought that VRAM was just a region in the 2 MB RAM, so unless there are some MMU shenanigans going on, it should be just as fast as any other region in RAM. I believe the RAM slowness is something that can be remedied by messing with the appropriate clocks (peripheral bus, I think?) in Ptune2. Of course, the X and Y memories are faster than the main RAM as they are on-die and are SRAM, and they have DMA access because IIRC they were intended for use by the DAC that the SH7305 might or might not contain (some experiments were made here at Cemetech and I can't remember the outcome).

While with DMA you can't change the contents of the region it is copying while the transfer is going on, if you can control the DMA engine I think you can take advantage of double-buffering, either for the whole screen or for the few-scanlines approach you're using. While one buffer is copying you can write on the other, switch buffers before starting a new copy, and so on.

While brainstorming about the subject a few years ago, I wondered if it would be possible to use a high-level emulation approach where the Z80 code is JIT-recompiled into SH4. I don't think anyone ever wrote a Gameboy emulator that uses JIT, because on common hardware (read: 150 MHz+ CPUs) it's completely overkill to do that when emulating a Z80 core at less than 5 MHz. But it could be useful on the Prizm. Having to maintain such tight interrupt timing probably undermines the idea, though.
Through plenty of experimentation, DMA0 to the LCD Controller doesn't work from all RAM, but definitely does from the VRAM range and on chip memory. I'm not sure why. There are definitely shenanigans. Writes to VRAM *are definitely* slower than the heap and static area for add-ins.

Obviously everything I'm doing is double buffered, it's the only real way to do it.

I'm happy to overclock the correct way. Is there documentation on this or is digging through the ptune source the only place to do so? There's alot of outdated documentation here. I only have go up one "notch" currently to get to 100% speed so I don't know if I'm doing anything particularly dangerous or unpredictable compared to the people that were OC'ing to 200 MHz and similar.

As an FYI, the OS appears to never use TMU. I'm using TMU2 at the highest frequency to make an accurate cycle counter. This is how I've been profiling different components and techniques.
Whoa, those VRAM findings are definitely interesting.
Just to get one thing out of the way: during your testing, you always used the same base address, right? RAM at 0xA0000000 ... 0xA1FFFFFF is not cached, 0x80000000 ... 0x81FFFFFF is cached. You can access the VRAM at 0xA0000000 or 0x80000000, but obviously only the latter one will be cached. If you switch between those two randomly of course results won't be consistent, but I'm sure you took care of that.

AHelper looked into the MMU, but his calc died in the process. DMA only working from lower RAM addresses could be explained with the DMA engine having a smaller number of address lines or something like that.
The bootloader does some address writes that are unexplained and could have to do with this, or not, who knows.

The slowness could be due to a different cache policy, or it could be because while the add-in stack is virtualized, the VRAM is not (but then I would expect the VRAM writes to be faster due to not having to be address-translated, not the other way around...).

I think that for overclocking you'd need to dive in the Ptune2 source code, yes. Nobody ever got around to documenting that nicely. This page: http://prizm.cemetech.net/index.php/CPU_Clocks documents the various clocks (at least some of them?) but doesn't describe how to change them, and the page it links to, Clock Pulse Generator, is incomplete.
Ideally you'd also save the previous clocks and restore them on exit (so that if the calculator was already overclocked, or underclocked, the previous state is restored).

About the TMU, the OS uses it in at least one syscall: http://prizm.cemetech.net/index.php/OS_InnerWait_ms
I'd be careful about assuming the OS never uses something. Here are some of the things people sometimes forget to take into account, and which can't be tested in the emulator:
- USB connection (including the mere act of plugging in a USB cable while the add-in is executing, plus everything that can happen from there, including entering mass storage mode and the other less used modes). The popup might appear only when GetKey is called, but I'm not sure.
- 3-pin connection (I believe that the Prizm can be configured to automatically accept file transfers and the like, which also cuts off add-in execution rather haphazardly). Again, maybe this is only when GetKey is used.
- Going into standby (including taking into account that the auto-power-off timer exists and might run only when GetKey is called, but better be safe than sorry). This is interesting because it involves writing a lot of data to flash (and you definitely don't want to mess with the stack of those routines while they run, or the RS memory in general, that's how I bricked my first Prizm main board) and the X, Y and IL memory contents are lost (and RS is used to hold code for resuming from standby).

Lots of things in the OS appear to revolve around GetKey ( http://prizm.cemetech.net/index.php/GetKey ), which means you can more or less get the OS out of your way if you don't call it. But other syscalls, timers and interrupts are still a thing. It's kind of hard to be sure one has full control over the program execution at any point.

Sigh, things could be so much easier and exciting if the bootloader was in a mask ROM or at least in a write-protected sector. We would be able to write and ship less-than-bug-free code and even if it somehow ended up damaging the OS, it would always be possible to recover...
Yeah I checked both different address locations. Even if non-cached had worked, it would have required a cache flush after modifying the buffer. But it didn't anyway, so no real use. The bus the DMA uses (or the flag that allows it to communicate with the LCD) prevents it from accessing the rest of the RAM is my guess.

Re: GetKey: yeah it's super slow and I don't use it during main emulation. In fact I'm avoiding all syscalls entirely at this point and have a clean start up / shutdown in order for the advanced usage. I don't know if I accounted for the standby though. I may be able to claim the interrupt for myself while running the main emulator and hand it back later. I know this sounds pretty dicey Razz I think it'll be good to have a reference codebase that does all of the really fast stuff and keeps the OS in check to avoid damaging anything. Maybe we can wrap it into a "HighPerformanceModeOn/Off" library or similar.

I look into Sarento's ptune docs further and he has some specific mem-wait tuning registers he uses to achieve remarkably better performance without changing the CPU clock or killing battery (max power draw changed from 34mA to 40mA). So I'm going to give that a shot when I get home from work today.
Yes, with regards to syscalls I think that's the way to go. Either use them and embrace the whole OS thing, which includes calling GetKey regularly (and GetKey is blocking, unless some tricks are used, and single-key - not of much use on an emulator), or try to avoid the OS and "suspend" its execution as much as possible, preferably making sure that it can get back to working order without a reboot, but ensuring that it won't begin executing unexpectedly (e.g. for a low-battery or timeout power off, or to handle the USB cable being connected). I can't stress this last part enough, you really don't want the OS to begin doing its thing while you have DMA, clocks and what not hijacked by your program.

Avoiding syscalls only becomes harder/impossible if you wish to do filesystem operations, which can be avoided by reading all the necessary files to RAM and getting off "HighPerformanceMode" before writing anything back. It would also be interesting to mmap ROM files into memory, although for the original Gameboy I think everything fits in the add-in stack and that's the safest way to go, for now.
I also have 344k of VRAM that's perfect for the currently unswapped ROM data Smile I'll just store checksum's on them to make sure the OS hasn't made a change to that area while emulating.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 2
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement