I've been browsing the SH7724 hardware manual with an eye towards possible ways to accelerate games, and I've found at least a few on-chip peripherals that could be useful (provided they're present in our CPU). Some notes on what I've found (and a few miscellaneous other thoughts):
DMAC - not a lot to say, it's a DMA controller. Can be useful for large copy operations, or could be abused to implement something like memset(), which would be handy for clearing the screen. Some care would be needed to ensure that only regions that have already been erased are modified by the CPU, but that's fairly easy just by polling the control registers.
Bdisp_Putdisp_DD - I've been assuming this function uses DMA, but haven't really seen any proof either way. Certainly it's possible, but it may actually eat CPU time while copying to the display. This could be tested quite easily by seeing what sort of framerate can be achieved in a tight loop over the syscall while polling the current wall time.
If it does indeed block while copying, that's an optimization opportunity, since the CPU could be doing useful operations while blitting VRAM to the screen. The only important thing to keep in mind would be that VRAM cannot be modified until the blit is complete, otherwise you'll see partial updates. This could greatly improve performance in programs that do heavy computation every frame.
If the syscall doesn't block while copying, that probably means it's internally double-buffered. It's less useful when optimizing for speed, but the secondary buffer could be used as additional RAM space if the double buffering isn't needed.
IL memory - 16k of on-chip memory (base address 0xE5200000), so it's very fast. AHelper confirmed some time ago that it's available for use in our own programs via a bit of experimentation. It's a rather small amount of memory, but still large enough to hold useful data. You might want to put a performance-critical data structure in this memory, or just copy (decompress?) your graphics data here for slightly faster blitting.
2D-DMAC - this peripheral would be perfect for blitting sprites to VRAM.
Simply set a few pointers and size parameters, and it can blit from a spritesheet onto your display buffer without eating CPU time. It can also do inline rotation, inversion, and color conversion, which could be useful for certain programs. This is wonderful for spriting if it's actually in our CPU. Point into the sprite sheet and it does the necessary scattering.
The performance benefits may not be useful for small sprites, though. Would require experimentation to see if it's worth the additional code complexity to use this module.
MERAM - 128k of on-chip memory, so the basic use case is like IL memory (base address 0xE8080000), but probably a little bit slower. More interestingly, it can function as a cache for several of the more specialized peripherals. Of those which can connect to the MERAM, though, only one appears to be of use to us on this hardware:
BEU - this module composites up to three different images into a single display. Most interestingly to me, it can do full alpha blending. The obvious option here is to display a HUD in games, but it may be possible to use this module just for blitting sprites with alpha channels (similar to the 2D-DMAC). Another compelling option is to split your game's graphics into fore- and background layers to avoid redrawing the entire display every frame (assuming the background may update less often).
It also appears to be capable of decoding palettes, which would simplify (and probably accelerate) the use of lower-bit-depth images (in order to save memory). I could see some programs rendering to VRAM in a palletized color space, allowing the system VRAM buffer to actually contain two reduced-depth buffers.
In conjunction with MERAM, it may be possible to do efficient composition and buffering for screen rendering. The MERAM has a frame buffer mode which may be useful in its own right, but it can also take input from the BEU. By combining a screen blit operation with the BEU into MERAM with a normal screen blit (probably via DMA), we could achieve very good performance in fairly complex scenes (particularly when alpha blending is involved).
Concluding
There's a lot of speculation here. We know the Prizm CPU has functional IL memory, so that's something easy to use. I assume DMA is functional, but haven't seen any confirmation either way of that. The remaining modules may or may not be present, and I have no idea.
If somebody wants to do some experimentation to try to determine what's available, that would be awesome. As I continue work on pLemmings, I'll probably be attempting to use some of these techniques (if the hardware is available), so there's a good chance of libfxcg support.
So. Thoughts? Anybody want to test whether these things are possible?
DMAC - not a lot to say, it's a DMA controller. Can be useful for large copy operations, or could be abused to implement something like memset(), which would be handy for clearing the screen. Some care would be needed to ensure that only regions that have already been erased are modified by the CPU, but that's fairly easy just by polling the control registers.
Bdisp_Putdisp_DD - I've been assuming this function uses DMA, but haven't really seen any proof either way. Certainly it's possible, but it may actually eat CPU time while copying to the display. This could be tested quite easily by seeing what sort of framerate can be achieved in a tight loop over the syscall while polling the current wall time.
If it does indeed block while copying, that's an optimization opportunity, since the CPU could be doing useful operations while blitting VRAM to the screen. The only important thing to keep in mind would be that VRAM cannot be modified until the blit is complete, otherwise you'll see partial updates. This could greatly improve performance in programs that do heavy computation every frame.
If the syscall doesn't block while copying, that probably means it's internally double-buffered. It's less useful when optimizing for speed, but the secondary buffer could be used as additional RAM space if the double buffering isn't needed.
IL memory - 16k of on-chip memory (base address 0xE5200000), so it's very fast. AHelper confirmed some time ago that it's available for use in our own programs via a bit of experimentation. It's a rather small amount of memory, but still large enough to hold useful data. You might want to put a performance-critical data structure in this memory, or just copy (decompress?) your graphics data here for slightly faster blitting.
2D-DMAC - this peripheral would be perfect for blitting sprites to VRAM.
Simply set a few pointers and size parameters, and it can blit from a spritesheet onto your display buffer without eating CPU time. It can also do inline rotation, inversion, and color conversion, which could be useful for certain programs. This is wonderful for spriting if it's actually in our CPU. Point into the sprite sheet and it does the necessary scattering.
The performance benefits may not be useful for small sprites, though. Would require experimentation to see if it's worth the additional code complexity to use this module.
MERAM - 128k of on-chip memory, so the basic use case is like IL memory (base address 0xE8080000), but probably a little bit slower. More interestingly, it can function as a cache for several of the more specialized peripherals. Of those which can connect to the MERAM, though, only one appears to be of use to us on this hardware:
BEU - this module composites up to three different images into a single display. Most interestingly to me, it can do full alpha blending. The obvious option here is to display a HUD in games, but it may be possible to use this module just for blitting sprites with alpha channels (similar to the 2D-DMAC). Another compelling option is to split your game's graphics into fore- and background layers to avoid redrawing the entire display every frame (assuming the background may update less often).
It also appears to be capable of decoding palettes, which would simplify (and probably accelerate) the use of lower-bit-depth images (in order to save memory). I could see some programs rendering to VRAM in a palletized color space, allowing the system VRAM buffer to actually contain two reduced-depth buffers.
In conjunction with MERAM, it may be possible to do efficient composition and buffering for screen rendering. The MERAM has a frame buffer mode which may be useful in its own right, but it can also take input from the BEU. By combining a screen blit operation with the BEU into MERAM with a normal screen blit (probably via DMA), we could achieve very good performance in fairly complex scenes (particularly when alpha blending is involved).
Concluding
There's a lot of speculation here. We know the Prizm CPU has functional IL memory, so that's something easy to use. I assume DMA is functional, but haven't seen any confirmation either way of that. The remaining modules may or may not be present, and I have no idea.
If somebody wants to do some experimentation to try to determine what's available, that would be awesome. As I continue work on pLemmings, I'll probably be attempting to use some of these techniques (if the hardware is available), so there's a good chance of libfxcg support.
So. Thoughts? Anybody want to test whether these things are possible?