Thank you! Several WADs might have a use if we split episodes; Critor suggested that and I think it would be useful too. 10 MB is really large for anyone playing other games at the same time. ^^

Not mapping the file to memory is indeed very slow, this is exactly why the mapping was designed. It's mostly as a tool to find bugs in the mapping.

The load/save function is indeed overwritten explicitly with the new game menu in the code; the options menu was like that too, I just recently reenabled it. Most of the bugs you mention are likely to be explicit too, I believe it will be on the easy side of the task list.

Also you were right on the money about composite textures, the 8 kiB limit it still there. Bumping it or removing it makes the textures work. But it also takes up more memory too, even though some WADs are currently memory-limited. I'll keep an eye on that as I still have some tricks up my sleeve in that regard. For now I'll remove the limit and see how far we can go.

Quote:
No not really. I just saw that this fx-CG50 emulator had the same CPU speed as the CG20 and all the programs that didn't ran on the CG50 ran on this emulator and vice versa. So I think they just took their CG20 emulator put the CG50 firmware on it, changed a few graphics and called it a day.

Ah, I see! This should be because of the RAM address using the physical map of the CG-20. Programs hardcoding the VRAM address is usually the only incompatibility, so all of these would work on the emulator. It makes sense that they kept most of the old emulation code!
Great news! All the levels of the shareware Doom now load, and all the levels of Ultimate Doom except level 6 load as well. Level 6 runs out of memory.

I've made a few changes to GUI: I added the brightness/gamma setting on the FRAC key, fixed menus that didn't work, added the space/return key for confirm screens (EXE/log on the current keymap), and added a main screen setting to warp to a level when starting the game. (Only the load/save feature is still disabled.)

More importantly, I've solved the crash issue; this was due to CGDoom attempting to return direct Flash addresses without allocating RAM when accessing lumps that are stored as a single fragment in Flash. This is a very useful optimization, but it breaks if the lump data is unaligned in the Flash. I simply made sure that unaligned lumps are copied and now it works! o/

If you have any opportunity to play-test again, this would be very welcome. I only learnt the first three levels and I don't know most secrets, so I can't easily make sure that the support is complete, even though I believe we're getting close.

* The shareware WAD from Planète Casio (I don't know the version) should work 100%, apart from that one glitched floor texture which appears white with stripes (first in E1M2)
* The stripped Ultimate Doom WAD should work too, except on level 6 which runs out of memory
* I assume the unmodified 1.9 shareware WAD you had should still work


Progress is coming along nicely, I believe a release will be in order soon. ^^
Wow, these are a lot of updates in a short time! Smile Doom looks really good now with all textures fixed. The shareware WAD from Planète Casio (finally i'm writing it correct) and my 1.9 WAD do both work. Also the brightness feature is very useful.
Here are a few issues I found:
- Blue keycard isn't shown, Yellow keycard is shown as two yellow keycards, Red keycard works fine
- Start at map x: really useful for testing, but you can't select the episode (it will always use Episode 1), the user shouldn't be able to enter negative numbers
- Pressing Quit Game sometimes crashes the game
- some switches don't show the pressed texture if you press them

Here are some of the results of testing the Ultimate Doom WAD:
Game crashes at:
E1M6 (E1M9 works, but crashes at the end)
E2M2 :/
E3M3
E4M6

I also just noticed that you released a new version! Surprised
I really like that you removed the white border, which looked really bad. Also there's now even more heap available.
So keep in mind that all testing I did is on the old version, but it would really help if you could make the change to select also the episode in the main screen so I can test more quickly and give you results for the new version.
Nice that you added the FPS counter but I think it always shows higher FPS (for example: 117MHz - Ultimate Doom WAD - Start of E1M2 (the stairs) with no enemies: 22FPS (but it felt more like a slideshow, 3FPS?)) Also the FPS counter blocks the messages, so could you move it to the top right side of the screen?
Excellent! I added the episode warp and pushed a new G3A. As far as I could test, E1M6, E2M2, E3M3 and E4M6 all load properly now (probably thanks to the slightly larger heap). If you have an error, it's useful to distinguish a System ERROR/reboot/freeze from a Z_Malloc/CGDMalloc failure (normally I can reproduce but if not then that distinction would be useful).

I'd noticed the blue keycard, I'll investigate in a moment. Do you have a particular switch to show me for the texture bug?

I checked the FPS counter but it seemed correct to me. Starting in E1M2, which is indeed quite laggy near the entrance, the counter showed 14 FPS displayed instead of the usual ~30, which seemed about right. I was way above 3 FPS at least, that's for sure (and fortunate!).

The FPS counter simply counts the number of calls to I_StartTic(). I believe it's always once per frame unless of course some frames are simulated but not rendered (like when you press VARS). For now I don't know how to put text in the right-hand corner so I intentionally used the message system to print the counter, I'll see later how to deal with that.

Anyway, we're getting there, that's for sure! Your help is very much appreciated, testing all 36 levels even just once would take me hours. x)
Quote:
I believe it's always once per frame unless of course some frames are simulated but not rendered (like when you press VARS)

Oh yes! That's probably the case. Even if you don't press VARS it still skips every second frame! If you press VARS for fast mode it skips every fourth frame. (MPoupe wrote this on Omnimaga: https://www.omnimaga.org/casio-prizm-projects/cgdoom/30/ ) Maybe you should take this into account.
Also do we still need this frame skip option? This was probably made to get the game running on the CG10/20 at better speeds. Maybe you can add an option to the VARS key to don't skip any frames.

Quote:
Do you have a particular switch to show me for the texture bug?

Here are a few links that show you the textures for the switches that don't work properly:
https://bghq.com/textures/doom/396.png
https://bghq.com/textures/doom/398.png
https://bghq.com/textures/doom/549.png
https://bghq.com/textures/doom/548.png
Missed that one (does work on some maps, on some not??)
https://bghq.com/textures/doom/402.png

Here are my testing results for the newest version (all WADS are from TI-Planet under Casio Graph 90+E/fx-CG - Jeux Doom and tested on an 213MHz overclocked calculator):

The Ultimate Doom WAD:
E1: should work, but same error as in E3 with additional System ERROR (secret level works)
E2M7: Z_Malloc Failure, game continues after that but it's very glitchy (in E2M3 there's a visual glitch if you go through the door at the start, secret level works)
E3: should work but pressing Read this! in the menu after completing the episode gives Z_ChangeTag: and W_GetNumName: errors (closing Read this! continues the game but very glitchy, can't get into secret level?)
E4: should work, but same error as in E3 (secret level E4M9: Z_Malloc Failure, game continues after that but it's very glitchy)
Something else I noticed is that there are massive slowdowns in areas with a lot of geometry but also in some with lesser geometry (these were not so slow in the shareware version, example E1M3: the outdoor area with a slime pit in the middle and at the sides). Overall the entire WAD feels pretty stuttery to play (some levels are better some are worse)

Doom 2 WAD:
Level 8: visual glitch like in E2M3
Level 15: Z_Malloc failure and freezes the calculator
Secret Level 31 & 32: working (they are Wolfenstein 3D themed, why is there no Wolfenstein 3D port for Casio Prizm? But someone has made one for the Casio FX-9860G)
In Doom 2 i didn't noticed any major slowdowns.

Heretic WAD:
Level 2: Z_Malloc Failure
Plays pretty smooth.

Pinochestein 3D WAD:
Level 8: Z_Malloc failure, a few other errors afterwards, last one System Error
Is a bit laggy sometimes but not too bad.

Also I often hit the MENU key while playing by accident which is really annoying. So would be cool if you need to confirm that you want to exit like in Quit Game.
Also after I removed the audio of my 1.9 shareware WAD the performance got worse (51 FPS instead of 58, took these values after E1M1 loaded, I didn't move, 213MHz with Ptune2). It got quite similar stuttery like The Ultimate Doom WAD.
So looks like the removal of the audio in Doom 1 (for some reason not in Doom 2) makes the game slow and stuttery.
I'm back after a short break. And wow, you really tested all the games! Thank you so much, things are moving several times faster from your help alone. <3

I'd seen the frameskip but somehow failed to realize, that's now fixed. Since the FPS counter now takes it into account, pressing VARS won't have a direct effect on the counter. I also managed to move the FPS counter to the top right of the screen (although not perfectly right-aligned) by adding a new widget in the HUD.

The blue key showing up yellow was due to the code initializing keys number 0/1/2 with sprites for keys number 1/1/2. I have no idea how this happened, but it's fixed.

I'll have a more detailed look at all these bugs and your other requests. Some of these Z_Malloc failures may not have a solution, as I'm exhausting an increasing number of options for heap extensions. But I'll be pretty happy if we can support a couple of games all the way through at least! Smile
I believe I found the quit menu crash. In the Linux Doom port where I found the list of messages, there were commas missing between groups, causing some pairs of messages to concatenate, which produced both buffer overflows and extra NULL pointers in the array.
I finally found the texture glitch in E2M3!!

This was a tricky one. I used Eureka DOOM Editor to find the missing texture (STARGR2), then its texture ID (256), then instrumented the code to observe the lumps it comes from, the composite attributes, and I ended up even watching the rendering of a particular side of a particular line to see what happened.

So what was the problem? Pretty deep inside the rendering engine, the check that determines whether a line is textured used this formula:


Code:
segtextured = (boolean)(midtexture | toptexture | bottomtexture | maskedtexture);

The or-ing is fine, but the cast to boolean (byte) yields 0 if by chance all texture on that line have IDs which are multiples of 256. I just changed that to "!= 0" and that glitch should be definitely fixed. This will be shipped with the next G3A update once I make some more progress.
Very nice! Smile
Also I want to make a quick addition to what I said with the run key. The run key is actually required for some levels in Doom (for example Doom 2, level 2 to get the red keycard). I completly forgot that because I cheated most of the time. Very Happy
Anyways I'm really looking forward to the next G3A update! Great work so far!
Thank you! Critor asked about it the other day, but as a complete noob myself I wouldn't know. xD

I identified an oddity related to the loading of switches which suggests that all the highlighted names in the image below may not be treated as switches. It seems to roughly match the list that you supplied to me, but as an added check does it match your memories?

If my lead is correct, the fact that one of them seemed to work on some maps would be because the background is different, and only SW1CMT has a problem.

Edit: Never mind I was too happy and I didn't realize the solution was right under my nose: the switch models for registered and retail episodes aren't loaded because the game is identified as shareware. Bypassing that made it work instantly, plus some extra checks to avoid loading errors on switches that aren't in the WAD.

New update with a new G3A! Very Happy

PRAM heap

I've now extended the heap to truly extravagant places, by adding some PRAM0. This is an area, slightly slower than RAM but with a healthy 140 kiB. The trick is that it only supports 32-bit access so only arrays of pointers, int and fixed_t can go there (oh and Z_Malloc won't work so I added a custom allocator).

I moved a number of static arrays there (indexes for textures, flats, lumps, stuff like that), for a total size that I estimate at 17 kiB for the Ultimate Doom WAD (it's actually pretty hard to find suitable arrays).

This has a small cost with performance overall, and moving some very intensive arrays would drop FPS significantly. This is because PRAM isn't cached. One on hand this is slower during each access, on the other hand this alleviates pressure on the cache for other intensively-used areas of RAM. With the arrays I've moved it should be pretty even.

Errors on Ultimate Doom E2M7 and E4M9

Both E2M7 and E4M9 work fine on my calculator, with 189 kB and 177 kB free memory when entering respectively. After you first mentioned it I played through the entirety of E2M7 (which took me a good half hour lol xD) without issue. With this amount of leeway the whole levels should be playable, I'm slightly worried if you have this much of a difference on your calculator.

The only thing I can see which would impact this number severely is the number of fragments in the file. Can you tell me how much you have in your Ultimate Doom WAD by starting it with developer information? I'm not sure how much I need to account for on average, and as you can guess more fragments makes everything slightly more difficult.

I have 200-500 on my calculator and the number is limited to 1024 so even that should have only limited impact, but I don't see anything else yet.

Performance


Quote:
Something else I noticed is that there are massive slowdowns in areas with a lot of geometry but also in some with lesser geometry (these were not so slow in the shareware version, example E1M3: the outdoor area with a slime pit in the middle and at the sides). Overall the entire WAD feels pretty stuttery to play (some levels are better some are worse)

I'm suspecting we won't be able to do a lot about that, unless I introduced the problems myself and can somehow bisect them. There might be memory access optimizations still dormant, but I think that's about it. You should have ~15 FPS in Hangar without overclock (117 MHz), if you have something very different please tell me.

To be honest I'm ready to compromise on requiring overclock for a fluid experience if it's needed for complete support, as I think semi-random errors on large levels are more of a hindrance to the player. This is not required yet, but I'm anticipating a little bit.

Quote:
Also after I removed the audio of my 1.9 shareware WAD the performance got worse (51 FPS instead of 58, took these values after E1M1 loaded, I didn't move, 213MHz with Ptune2). It got quite similar stuttery like The Ultimate Doom WAD.
So looks like the removal of the audio in Doom 1 (for some reason not in Doom 2) makes the game slow and stuttery.

Does the removed audio affect the position of other lumps? In particular, would it happen to unalign a lot of lumps by accident? Some insight into the format and/or editing tool might help for that. I asked Critor, but he wasn't aware of the details (and neither am I!).

Keymap changes

* Escape (in-game menu) is now bound to the [MENU] key.
* Immediate quit (previously bound to [MENU]) is now unbound, use "Quit Game".
* Interactions to open doors are now bound to [x²] instead of [EXE].
* Validation in menus are now bound to [EXE] instead of [log].

Once picked up I think these bindings will make a lot more sense!

New G3A and unresolved problems

I've pushed a new G3A, if possible I'd like you to confirm the allocation problems on E2M7 and E4M9. And for bookeeping, the following problems are still unresolved:

* Doom I episode 3 : "pressing Read this! in the menu after completing the episode gives Z_ChangeTag: and W_GetNumName: errors (closing Read this! continues the game but very glitchy, can't get into secret level?)"
* Errors in particular levels, mentioned in this message
* Removing audio affecting performance
Yay, Update! I really like that the menu is now on the MENU key and this also solves the problem of hitting the MENU by accident and closing the game. The new key layout is nice but I'm still used to the old key layout and keep hitting EXE to open doors Very Happy. Really nice that the FPS counter is now on the top right and it works perfect.

Quote:
Both E2M7 and E4M9 work fine on my calculator, with 189 kB and 177 kB free memory when entering respectively. After you first mentioned it I played through the entirety of E2M7 (which took me a good half hour lol xD) without issue. With this amount of leeway the whole levels should be playable, I'm slightly worried if you have this much of a difference on your calculator.

The only thing I can see which would impact this number severely is the number of fragments in the file. Can you tell me how much you have in your Ultimate Doom WAD by starting it with developer information? I'm not sure how much I need to account for on average, and as you can guess more fragments makes everything slightly more difficult.

I have 200-500 on my calculator and the number is limited to 1024 so even that should have only limited impact, but I don't see anything else yet.

I have 314 fragments when starting the WAD. In E2M7 I have 145 KB free, in E4M9 133 KB (for some reason always ~44KB less than you :/). I can play E2M7 but I still get a Z_Malloc failure and a Z_ChangeTag error. E4M9 gave me shortly a Z_Malloc failure but I could continue playing after that.
What Ultimate Doom WAD do you use? I use this one.

Quote:
You should have ~15 FPS in Hangar without overclock (117 MHz), if you have something very different please tell me.

This is mostly true but sometimes the FPS drops into the single digits like when I open the first door in Hangar the FPS goes down to 5 but goes up again after a short while. This is what I meant with the stutters. Always here and then FPS are dropping into the single digits. Also the Ultimate Doom WAD always seems to be slower. In E1M3 I get 8 FPS with no enemies on screen and in the shareware WAD 15 FPS even though it doesn't need to render more. Something really seems to affect the FPS in the Ultimate Doom WAD.

Quote:
Does the removed audio affect the position of other lumps? In particular, would it happen to unalign a lot of lumps by accident? Some insight into the format and/or editing tool might help for that. I asked Critor, but he wasn't aware of the details (and neither am I!).

For removing the audio I selected in SLADE all the lumps under the section audio, deleted it and saved the file. To save the file I first had to change the file to PWAD because it wouldn't allow me to save the file as IWAD. Seems to be that it's affecting the position of other lumps. In the shareware WAD audio is from index 109 to 231. If these are getting removed every lump that came after the audio before has a new index and therefore a new position. Not sure how this affects the performance but it's around 5 FPS slower.

Also something still seems to be wrong with the keys. The blue skull key doesn't show up after picking it up.
I'm back again for another ride, hopefully I can finish this (offering decent support for all the WADs) quickly!

This post and update results from a lot of performance investigation, especially between WADs. There are news, and they're good. This time, I've decided to leave more details, for your enjoyment and mostly in case someone looks at this code after me. So it's a pretty long post!

Changes in this update

Let's start with this. First, I've fixed odd structure issues with the repository and build system, and changed the final file name to CGDoom.g3a. Please make sure to delete the old CG_Doom.g3a file before using the new one to avoid any confusion.

MPoupe's version of CGDoom had an "emulator" which is a Windows target for the game, which I believe he used to debug the file mapping mechanism. I could never build it so it deteriorated with my changes, and it would be unusable now. I've started to rewrite it using the SDL2 API, which would help running CGDoom back on PC (and mine in particular x3). There is no longer a need to debug the file mapping mechanism, but it could be useful for statistics on the WAD since DOOM itself can act as a programmable WAD inspector. It doesn't work yet, but a large chunk of the port is done.

As a side note, I've tried using SLADE, but it keeps crashing for some really inconsequential reasons like not finding files about the UI layout. I've tried to troubleshoot it but I keep getting fresh errors, so I guess I'll leave it aside for now.

I've also added more developer stuff:

• A "Trust unaligned lumps" option, the purpose of which is explained below.
• Another developer screen when leaving, which reports unaligned lumps.
• The free memory key now reports memory by region and indicates total size.
• A profiler key on [)] to report detailed performance measurements (see below).

The full story about unaligned lumps

This post mentions unaligned lumps quite frequently since these were one of my specific targets. I'm not sure how familiar you are with these low-level concepts so I'll explain what these are in detail, maybe I can clear up some uncertainty in the process.

When accessing memory, the CPU can request either 1, 2, or 4 consecutive bytes. This is commonly used to load variables that occupy 1 byte (chars and bytes), 2 bytes (shorts) or 4 bytes (ints, fixed, pointers). However, a 2-byte access at an address can only be performed if the address is a multiple or 2, and a 4-byte access can only be performed if it's a multiple of 4. For instance, at 0x8c000002 you can access 1 byte or 2 bytes but not 4 bytes.

This is a called an alignment requirement. We say that 0x8c000002 is 2-aligned but not 4-aligned, which is often referred to as 2-aligned for short (implying this is the largest alignment for that address).

CGDoom, like any program, uses different access sizes to load variables from lumps. In some cases, like textures, CGDoom either only uses 1-byte accesses or it uses functions that don't care about alignment (like memcpy), so the lump can be at any address, even unaligned.

However, some lumps like line definitions or nodes contain shorts or ints and CGDoom uses 2-byte or 4-byte accesses when using them. This requires that every single short or int in the lump is properly aligned in memory or a System ERROR occurs. Fortunately, the C language is well-designed, and this requirement will automatically be met as long as the lump itself starts on a suitably aligned address (which means 4-aligned if any 4-byte accesses are used in the lump, or 2-aligned otherwise).

The original version of DOOM loads lumps to RAM with Z_Malloc. malloc-type functions are also well-designed and therefore always allocate at addresses with maximum alignment (4-alignment in our case), so loading any lump with this method guarantees that accesses will succeed.

However, CGDoom contains an optimization to not load lumps to RAM when they are stored in the filesystem in a single fragment of the WAD file. This is because on the fx-CG, the filesystem can be accessed with a pointer like any other memory, so unless the lump is fragmented by the filesystem it will use less RAM to address it directly in ROM. In this situation, the alignment of the lump in memory is determined by its position within the file. For the lump to be 2-aligned, its position within the file must be a multiple of 2. And for it to be 4-aligned, its position in the file has to be a multiple of 4. Otherwise, you get a System ERROR when performing the access.

You might remember that at some point E1M4 would crash upon loading; this was why. Currently I load unaligned lumps to RAM with Z_Malloc, which takes both a bit of memory and a bit of time, but avoids this issue.

Porting libprof to CGDoom

When optimizing my programs for performance I usually use a small library of mine called libprof, which is incredibly handy. It measures execution time with precision below 1 µs and can also be used as profiler. So far I'd used the RTC to measure file mapping time (which is in the seconds) and FPS, but that wouldn't cut it for true performance analysis.

So I added my libprof code to the repository. With libprof I can determine how much time is spent allocating memory, loading lumps, rendering frames, sending frames to the display... I added counters for these very things. Pressing the [)] key runs the profiler for 40 frames (frameskip included) then shows the results as a player message like this:

DA:53ms GR:1537ms DI:459ms LL:111ms ULL:0ms

DA is Dynamic Allocation (time spent in Z_Malloc and Z_Free)
GR is Graphics Rendering (rendering the 3D view)
DI is Display Interface (sending the rendered frame to the display)
LL is Lump Loading (copies from ROM, non-copied lumps take virtually no time)
ULL is Unaligned Lump Loading (subset of LL for unaligned lumps)

Barring other yet-unknown sources of performance drops that I can add later, now if we observe suspicious performance you wen can run the profiler and see if there is a culprit. As you will see in a moment, this has already helped me find discrepancies in Hangar.

Consistent differences in heap consumption

Quote:
I have 314 fragments when starting the WAD. In E2M7 I have 145 KB free, in E4M9 133 KB (for some reason always ~44KB less than you :/). I can play E2M7 but I still get a Z_Malloc failure and a Z_ChangeTag error. E4M9 gave me shortly a Z_Malloc failure but I could continue playing after that.

Hmm, this is strange. One would think you have unlucky fragmentation cutting into large lumps and forcing them to be loaded to RAM. But sometimes I have more than 1000 fragments (!!), and I still have exactly 44 more kB of free RAM than you (give or take 1 kB). There must be something subtler. I've changed the "Free" message to show consumption per heap region, just in case the heap happens to be smaller on your side. In Ultimate Doom, pausing straight after loading into E2M7 gives me the following (with developer information enabled):

Quote:
Fragments: 1143
Free: 1/422 kB, 20/249 kB, 161/162 kB
Unaligned lumps: 0 (0 B)


In Ultimate Doom E4M9, with the same setup:

Quote:
Fragments: 1143
Free: 2/422 kB, 7/249 kB, 161/162 kB
Unaligned lumps: 2 (12708 B)


Could you please try this setup to compare?

Performance bottlenecks: saturated regions destroying the heap

Quote:
This is mostly true but sometimes the FPS drops into the single digits like when I open the first door in Hangar the FPS goes down to 5 but goes up again after a short while. This is what I meant with the stutters. Always here and then FPS are dropping into the single digits. Also the Ultimate Doom WAD always seems to be slower. In E1M3 I get 8 FPS with no enemies on screen and in the shareware WAD 15 FPS even though it doesn't need to render more. Something really seems to affect the FPS in the Ultimate Doom WAD.

Ok so the matter here is pretty complicated. First, Hangar. There is a lot of geometry lying after that door, that's why it drops. It does not "go up again after a short while", but stays consistent in the 4-6 FPS realm as long as I look at that geometry. There was a difference between the two versions at hand though, shareware had 6 FPS while Ultimate Doom dropped to 4 FPS.

E1M2 was a even clearer: I had 9 FPS in the shareware, while only 4 in Ultimate Doom. I noticed that Ultimate Doom had exhausted the first heap region. This prompted suspicion because the design of the allocator (called "next fit") meant that the whole first region had to be traversed and fail for every single allocation, which is a huge cost. (This is what prompted me to port libprof.)

You can see for yourself the difference in profiling, see in particular DA.

Shareware Hangar: 6 FPS
DA:27ms GR:2424ms DI:459ms LL:280ms ULL:0ms
Shareware Nuclear Plant: 8 FPS
DA:53ms GR:1537ms DI:459ms LL:111ms ULL:0ms

Ultimate Doom Hangar: 4 FPS
DA:1970ms GR:3917ms DI:459ms LL:2276ms ULL:2276ms
Ultimate Doom Nuclear Plant: 4 FPS
DA:1605ms GR:3536ms DI:459ms LL:1975ms ULL:1975ms

That, and as you can see unaligned lumps being loaded left and right resulted in a lot of overhead (they probably caused most of the DA calls). Note that lumps are loaded during rendering so it's normal that GR went up that much. The developer statistics when closing the game show that anywhere from 20 MB to 200 MB of unaligned lumps are loaded dependending on how long you play, which we know is not needed in Hangar and Nuclear Plant because they didn't crash before I fixed E1M4.

So I started by improving the dynamic allocator to avoid spending so long on finding free blocks (basically extending the next fit paradigm over zones). The new results were as follow:

Ultimate Doom Hangar: 7 FPS
DA:11ms GR:1865ms DI:459ms LL:486ms ULL:485ms
Ultimate Doom Nuclear Plant: 7 FPS
DA:10ms GR:1954ms DI:459ms LL:526ms ULL:526ms

So I think that solves a nice bit of the discrepancy while noticeably improving performance in levels with lots of data loaded.

Note that if I "trust unaligned lumps" (detailed below) I get up to 10 FPS due to the reduced cost on ULL. This will be an objective.

Performance bottlenecks: uselessly loading unaligned lumps

Quote:
Seems to be that it's affecting the position of other lumps. In the shareware WAD audio is from index 109 to 231. If these are getting removed every lump that came after the audio before has a new index and therefore a new position. Not sure how this affects the performance but it's around 5 FPS slower.

Thank you. I meant the exact position in bytes, specifically its alignment. I hope the explanation at the top of this post clears that up. The concern is about moving lumps by 1 to 3 bytes within the file.

As you've seen before, WADs don't require lumps to be aligned, since the game mostly works even with unaligned lumps. I suspect the WAD editing software of mistakenly breaking alignment as a side effect of removing audio files or splitting episodes, but in any case for complete compatibility it's best if WADs with unaligned lumps are fully supported.

However, as you've seen, unaligned lumps have a certain cost that is hard to deal with because it's difficult to know when unaligned lumps will work in the program. In Hangar, they're textures, so we don't care that they're unaligned. But in Command Control some are part of the level definition and need to be aligned. We can't detect that, so we're doomed to either load all of them or fail on some of them.

There is one alternative though. MPoupe had taken steps to break down multi-byte accesses into single-byte accesses to lift the alignment requirement. If this approach can be completed, it would give maximum compatibility without needing to delve deep into the WAD format. However, it means that we need "complete coverage" of the code in order to ensure that all multi-byte accesses into lumps are broken down. I have a few tricks to help us get there with less work (unaligned C structures), we'll see if it's enough.

I'd like to work towards that goal, so I've added an option called "Trust unaligned lumps" in the main menu which will skip the loading of non-fragmented unaligned lumps, so that we can play around and look for System ERRORs. If you have opportunities to keep testing, I'd like you to use this option and report the System ERRORs so that I can attempt to fix them one at a time.

I have fixed the bug that prevented E1M4 from loading its line definitions by splitting accesses as mentioned previously (which is much less costly than loading the lump). I also fixed one that caused issues in Ultimate Doom E1M9 and was related to nodes. As of now I've checked that every level of both shareware and Ultimate Doom could be entered (barring the chance that some unaligned lump that would cause problems was actually loaded to RAM while testing because it was fragmented).

Incorrect skull keys

Quote:
Also something still seems to be wrong with the keys. The blue skull key doesn't show up after picking it up.

I didn't know about these keys, thanks. I tracked it down to some commented-out code that was probably left here before me. I've fixed it; note that since the cheat key gives you all the keys you now get all three skulls keys when you cheat even if the level doesn't require them.
Wow, that got technical really quick! But still very interesting to read.
This update is absolutly awesome! The performance in the Ultimate Doom WAD is so much better thanks to the better allocator and also the option to trust the unaligned lumps. Also libprof and the new free message are really useful.
Quote:
Could you please try this setup to compare?

Sure, here are my results:

E2M7:
Quote:
Fragments: 320
Free: 57/422 KB, 46/249 KB, 36/162 KB
Unaligned lumps: 4492 (19927600 B)

E4M9:
Quote:
Fragements: 320
Free: 18/422 KB, 6/249 KB, 103/162 KB
Unaligned lumps: 5213 (17281616 B)

Looks very different than your results but luckily my heap is not smaller. Don't trust the unaligned lumps values too much. They were always different when I did the same setup again.

I'll test the "Trust unaligned lumps" option a bit more tomorrow.
Excellent! I'm happy it made such a big difference in performance, I did not expect that there would be such a (in the end) small bug making such a major impact. ^^

I won't go too deep with technical matters for a while, unless there is really a need.

Your results are different because the fix for the allocator performance makes it so that allocation is more spread between regions, and the data I had in my post was from before that fix. But in any case the heap is of the correct size so there must really be something more being allocated. I need to figure out what and then track it.

The variations in unaligned lump figures are normal, because some are textures, so if you play longer they will be loaded several times and inflate the statistics. It's mainly useful when playing a single level to determine if there are any, or have an idea of the overall volume, thus impact on performance (but the profiler came after that with more detailed info).
So what can I say about the option "Trust unaligned lumps"? Well, it's awesome! Framerates are higher and more stable because it doesn't load permanently so much stuff in, which gives a way better game experience when playing the Ultimate Doom WAD.

Sadly I get Z_Malloc failures / Z_ChangeTag errors already in E1M6 and it sometimes also freezes the calculator. But looking at the free memory there's still plenty of free space?? (~227 KB) This also happens when not using the option to trust unaligned lumps. Not sure what's wrong here. Maybe the new allocator has some bugs or the Z_Malloc message shows up even if there's no Z_Malloc failure?
Here are the other episodes:
E2M5: Z_Malloc failure, Z_ChangeTag error
E3: should work
E4M6: Z_Malloc failure

Sometimes there's also a Z_Malloc error between the missions on the intermission screen.

Here are a few other issues:
- After a Z_Malloc error the game can continue sometimes but then some textures can get corrupted
- a small one: Picking up a Medikit (not Stimpack) with a health of 100% or more still shows the message that you picked one up but actually you didn't
- Starting the game with a WAD file which filename is longer than 11 characters gives a System ERROR, also why can I open the catalog in the main screen?

With this option framerates are already a lot better but they are still not quite as good as in the Shareware version of Doom so there are surely some performance bottlenecks left...

Edit: I'm probably wrong about that because I used a different calculator to test the shareware WAD. The one calculator which I used to test the shareware WAD had a higher battery voltage than the one with the Ultimate Doom WAD and therefore could push a few more frames. Running the shareware WAD on the same calculator where the Ultimate WAD is also results in pretty much the same performance (only 1-2 FPS difference with overclocking). Also my modified shareware WAD without music doesn't drop any performance anymore if I activate the trust unaligned lumps. Smile
So I tested a bit more and always took a look with your profiler and I think I found one last performance bottleneck in the Ultimate Doom WAD and shareware WAD.
First enable "Trust unaligned lumps" and play up to the start of E1M2 (don't move after the level loaded). Also don't warp to E1M2 or you'll not see it! Something else I noticed is that I can't get this one consistently when playing to E1M2. So try again if it doesn't work.
These are my results (with overclock - 213MHz):
Quote:
Ultimate Doom WAD with playing to E1M2:
18 FPS
DA:3ms GR:346ms DI:103ms LL:47ms ULL:0ms

Ultimate Doom WAD with warping to E1M2:
26 FPS
DA:0ms GR:216ms DI:103ms LL:0ms ULL:0ms

Quote:
Shareware WAD with playing to E1M2:
17-18 FPS
DA:8ms GR:371ms DI:103ms LL:7ms ULL:0ms

Shareware WAD with warping to E1M2:
25 FPS
DA:0ms GR:225ms DI:103ms LL:0ms ULL:0ms


See how it's permanantly allocating memory and loading lumps in? For some reason this does not happen if you warp to E1M2. This happens also in few other areas like in E2M5.

It's really hard to get this bug working, most of the time i couldn't get it to work. I also think it has something to do with overclocking because I could never get it to work on stock speeds. You can actually see the lower framerate always shortly if you enter E1M2 because it needs to load something. But after that the framerate is back to normal. What I think is happening is that for some reason when overclocking it loads the same stuff over and over again which explains the permanently lower framerate.

Yeah, I just tested it. It has really something to do with overclocking. Overclocking my calc to a unsafe value but still save enough that it wouldn't crash I got the bug working pretty much every time! So, yeah. I think we can ignore this.
Good, very good! There is progress. I'm looking to wrap up this port now, I think we should be close. As usual I pushed a new G3A with the changes below.

First, let me recall that Z_Malloc failures occur when the program runs out of memory. By this very premise the program is unable to proceed, so allocation failures are, by nature, non-recoverable. Therefore, being able to play after an allocation failure is merely a random chance and a level should be considered buggy whenever such an error can be reproduced consistently.

Heap fragmentation

I've reproduced the error in E2M6. This is a fragmentation issue; basically as allocations are made and released the free space in the heap gets broken down in smaller pieces, mainly because allocations can be small, large, short-lived, long-lived, and all of them are mixed together. We end up with 150 kB of memory but all in small pieces and unable to satisfy a 32 kB request that pops up suddenly.

It comes up now because the fix to Z_Malloc performance means all regions are used equally so they all get fragmented (and rather quickly at that, due to the design of Z_Malloc). It's truly a difficult problem, which I worked around by putting one region apart and using it only as a last resort. This way it is used less, thus gets less fragmented, thereby mitigating the issue.

More heap overall

I scraped some more memory to be added to the heap:
* I took 26 kB off the system stack, leaving 64 kB of stack (instead of 90 kB previously).
* I took 8 kB off the user stack, leaving 32 kB (instead of 40 kB previously).
* I saved 27 kB by declaring as const a pretty huge state array that was non-const but never modified.

That's 61 kB total. I had a look at the rendering structures which cost the most (up to 180 kB!) but some of my attempts at reducing them crashed in open areas with a lot of geometry, so I rolled them back.

The combination of both heap changes should solve the errors you had and even extend support.

Miscellaneous issues

I've added developer statistics on the end screen, that indicate how much lump accesses resulted in loading (or not loading) lumps to RAM. How many lumps can be referenced (= not loaded) varies widely depending on the WAD and level, ranging from 25% loaded to 80% loaded. However I also track the total volume of memory allocations, and loaded lumps are consistently the heaviest Z_Malloc user, so they still have to be the primary target for optimization.

Please report these statistics after a quick warp to E1M1, pause, show memory, and exit in the Ultimate Doom WAD, both with and without trusting unaligned lumps. I would like to evidence the 44 kB difference that was mentioned earlier. Here are my results:

Quote:
# Without trusting unaligned lumps:
Fragments: 842
Free: 12/448 kB, 275/275 kB, 161/162 kB
Memory allocated: 5861 kB
Lumps loaded: 1364 (5500 kB)
... of which unaligned: 1025 (2765 kB)
Lumps referenced: 6 (21 kB)

With trusting unaligned lumps:
Fragments: 842
Free: 59/448 kB, 275/275 kB, 161/162 kB
Memory allocated: 3014 kB
Lumps loaded: 332 (2652 kB)
... of which unaligned: 0 (0 kB)
Lumps referenced: 973 (2582 kB)


Quote:
- a small one: Picking up a Medikit (not Stimpack) with a health of 100% or more still shows the message that you picked one up but actually you didn't

This is now fixed.

Quote:
- Starting the game with a WAD file which filename is longer than 11 characters gives a System ERROR, also why can I open the catalog in the main screen?

This is also somewhat fixed, in the sense that I pushed the limit to like 27. I don't want to bother too much with dynamically allocating long file names to be honest.

You can open the catalog because keyboard input in the menu uses the GetKey() function, which handles a crap ton of OS features. If you play around you will see that you can open the catalog in many unexpected places in many add-ins because of this ^^

Loading at the start of a level

Quote:
Yeah, I just tested it. It has really something to do with overclocking. Overclocking my calc to a unsafe value but still save enough that it wouldn't crash I got the bug working pretty much every time! But this doesn't mean that this bug doesn't happen on safe overclock values just way less often.

Alright, so I don't know what to make of this presently. I attempted to reproduce the problem with about a dozen run on the shareware WAD but could not obtain the behavior you described. I also don't see how the program could be affected by overclock, which suggests environmental/hardware causes might be involved. I'll think it over and see if If can come with reasonable leads to investigate the difference.

Also, excuse me, what are these numbers?!

Quote:
Shareware WAD with warping to E1M2:
25 FPS
DA:0ms GR:225ms DI:103ms LL:0ms ULL:0ms

With the CPU at 235 MHz (the fastest Ptune3 default) I was at 802±2 ms of rendering (GR). Near the fastest Pϕ setting I was at 350-ish ms of display interface (DI). Just what kind of black magic did you use to overclock the hardware that much?! xD
Very nice! I tested the entire Ultimate Doom WAD and couldn't get any Z_Malloc errors anymore. (only once when I pressed Read this!) I thought I could get Doom II a bit further with the extra heap but it still gives me a Z_Malloc error in level 15.

Quote:
Just what kind of black magic did you use to overclock the hardware that much?! xD

There is absolutly zero black magic involved with this. Very Happy
I just used a different program, namely Ptune2 1.24 (Download: https://pm.matrix.jp/ftune2e.html). Ptune3 for some reason doesn't get the same performance with maximum overclock as Ptune2 does.
To get my performance open Ptune2 1.24, press F5 for 191.69MHz and then press 3 times right on PLL to get 213.81MHz. That value is stable on my calculator but I also tested it on a different calculator and I could only get 206.44MHz to work stable (213.81MHz crashed it always when playing CGDoom). So you might need to test a bit around with this to get it working stable.

Also you were right that overclocking doesn't affect it. If you still can't get the bug working for yourself, I managed to record the bug on the shareware WAD from Planète Casio with stock speed and on 213.81MHz:
117.96MHz (stock speed): https://youtu.be/58vajYOSr_8
213.81MHz (max overclock with Ptune2): https://youtu.be/wHOdQlEcEJ8
I also got the same bug working in E1M3 in the Ultimate Doom WAD:
https://youtu.be/KYe_FpVOoa0

This bug seems to be temporary because when I move away far enough and return back to the starting position the bug is gone. Couldn't get it to work in E1M3 for some reason.
As said before, I could only get it to work without warping to E1M2. But I got it working on E1M3 with warping for some reason.

Also I found the performance in the Ultimate Doom WAD or Shareware WAD is sometimes around 5-6 FPS slower. With maximum overclock I can get 30FPS in E1M1 right after it loaded but sometimes only 24-25FPS with the same overclock.
I think this is because of file fragmentation because after formatting the flash I can reach these higher framerates again.

Quote:
Please report these statistics after a quick warp to E1M1, pause, show memory, and exit in the Ultimate Doom WAD, both with and without trusting unaligned lumps. I would like to evidence the 44 kB difference that was mentioned earlier.

Here are my results:

Quote:
Without trusting unaligned lumps:
Fragments: 237
Free: 20/448 KB, 275/275 KB, 161/162 KB
Memory allocated: 5827 KB
Lumps loaded: 1362 (5466 KB)
... of which unaligned 1217 (3960 KB)
Lumps referenced: 8 (54 KB)

With trusting unaligned lumps:
Fragments: 237
Free: 83/448 KB, 275/275 KB, 161/162 KB
Memory allocated: 1796 KB
Lumps loaded: 141 (1434 KB)
... of which unaligned: 0 (0 KB)
Lumps referenced: 1148 (3649 KB)


Also there are still a few differences between the MS-DOS version of Doom and this Port:
- no effects for taking damage, picking up items or having a radiation suit (for expample the screen flashes red when taking damage (more damage means longer this red effect), there's a greenish screen when having a radiation suit and the screen flashes shortly yellow when picking up an item)
- no melting screen effect when starting or finishing a level
- the gameplay starts right away, DOS Doom shows a title screen and then it plays demos of a few levels in Doom (also End Game under Options should bring you back to the title screen)
Thanks again! No update today (yet), but a couple of thoughts. Smile

Quote:
I just used a different program, namely Ptune2 1.24 (Download: https://pm.matrix.jp/ftune2e.html). Ptune3 for some reason doesn't get the same performance with maximum overclock as Ptune2 does.

Oh, I see! It seems Ptune2 is more liberal on memory timings, the difference is extreme! Thanks for letting me know, it might have some use in the future.

I've realized there is a "problem" with your numbers.

Quote:
Ultimate Doom WAD with playing to E1M2:
18 FPS
DA:3ms GR:346ms DI:103ms LL:47ms ULL:0ms

Ultimate Doom WAD with warping to E1M2:
26 FPS
DA:0ms GR:216ms DI:103ms LL:0ms ULL:0ms

Profiling is on for 40 ticks after pressing the [)] key. Since there is frameskip by default, there is one frame every two ticks, therefore the FPS counters let us know that your game was running at approximately 36 and 52 ticks per second, assuming that remained somewhat constant for the duration of profiling.

In this setting, 40 ticks of profiling will have taken approximately 1.11 second and 0.77 second, respectively. However, the measurements only account for 499 ms (45%) and 319 ms (41%) of that duration. Therefore, the program spent most of its time not being profiled, which suggests that the actual bottleneck is somewhere else. At least, I'm confident that the lump loading (while possibly related) is not in itself the source of such an FPS drop.

Quote:
Also I found the performance in the Ultimate Doom WAD or Shareware WAD is sometimes around 5-6 FPS slower. With maximum overclock I can get 30FPS in E1M1 right after it loaded but sometimes only 24-25FPS with the same overclock.
I think this is because of file fragmentation because after formatting the flash I can reach these higher framerates again.

Yeah one thing's for sure the need to load more lumps is definitely impacting performance negatively. There are multiple reasons for that, ranging from larger files being more densely fragmented, to more time being spent doing the actual loading from ROM to RAM, to more static lumps being accessed at startup, to stronger heap pressure causing both more time finding free memory and more fragmentation...

I fear that "reformat the storage memory and retransfer the files" is going to remain a magic "make-me-faster" button for CGDoom (unless some obscure trick like split lumps can be made to work somehow, which I'm not very confident about).

Quote:
Without trusting unaligned lumps:
Fragments: 237
Free: 20/448 KB, 275/275 KB, 161/162 KB
Memory allocated: 5827 KB
Lumps loaded: 1362 (5466 KB)
... of which unaligned 1217 (3960 KB)
Lumps referenced: 8 (54 KB)

With trusting unaligned lumps:
Fragments: 237
Free: 83/448 KB, 275/275 KB, 161/162 KB
Memory allocated: 1796 KB
Lumps loaded: 141 (1434 KB)
... of which unaligned: 0 (0 KB)
Lumps referenced: 1148 (3649 KB)

Wow, don't even think about 44 kB, the differences are huge. I mean it's clear by this point that variations based on internal fragmentation are inevitable.

Also, see how pathetic the number of referenced lumps is when not trusting unaligned lumps? I'll bet this is because the removed audio changed the alignment of basically everyone. That would explain the difference in performance caused by removing audio fairly easily.

Quote:
Also there are still a few differences between the MS-DOS version of Doom and this Port:
- no effects for taking damage, picking up items or having a radiation suit (for expample the screen flashes red when taking damage (more damage means longer this red effect), there's a greenish screen when having a radiation suit and the screen flashes shortly yellow when picking up an item)
- no melting screen effect when starting or finishing a level
- the gameplay starts right away, DOS Doom shows a title screen and then it plays demos of a few levels in Doom (also End Game under Options should bring you back to the title screen)

That and Doom II will probably be the next targets. Main menu is probably very easy. Screen effects are somewhat computationally intensive but I can try to enable them and add an option to let players disable them if it's too slow to play. The melting/wipe effect requires more memory, I'll look into it but if I enable it I'll at least make sure to skip it if memory is running low to avoid Z_Malloc failures.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 2 of 4
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement