So I found something interesting by just malloc'ing chunks of data and counting the total number of bytes I could malloc.
Chunk size : bytes malloc'able
1: 16384
2: 32768
4: 65536
8: 86016
16: 102400
32: 114688
64: 114688
128: 114688
256: 98304
512: 65536
1024 : 65536
65536: 65536
114688: 114688
115000: 115000
So, I've seen a few people say stuff about headers in malloc, and I understand that, but wouldn't this contradict that?
Well, the 1 thing makes sense: 16K table entries for 1-byte malloc buffers is a hell of a lot of overhead. Other than that, though, this is totally weird.
Here's what I think is going on:
Your program is limited to 128KB (131072 bytes) on the heap. The one-byte allocations consume 8 bytes each. Maybe 4 bytes are used for a linked list of allocations, and 3 bytes are to maintain alignment. The 8-byte allocations follow this model closely too: 8 bytes plus 4 bytes overhead (no alignment needed).
I don't know what's happening above 32 bytes, though. Logically the total size should increase, but this system could have a sub-optimal allocator.
I can't program this myself, but would it be possible to do something like this:
Start with a malloc chunk size of 4, and alloc each chunk until you can't alloc anymore like we're currently doing. But the difference being that you store the address of the previous chunk in the first 4 bytes. That way, you can delete the entire section, and increase the alloc chunk size. You'd have to write all your data to a file, though, which might mess up results. Let me test this on 4 and get back to you.
Let's see:
1: 16384
4: 65536
With 1, let's assume 3 bytes padding and a 4-byte link, as Chris said. That means 16K*8 = 128KB. With 4, then we should be able to fit (128KB)/(8B) since no padding, which would be 16K chunks, but we get 64K instead, or 512KB's worth. WTF.
Edit: and now 32KB for 2-byte chunks, which should either be 64K or 16K, not neither. RARGH. Nonsensical results are nonsensical.
Edit #2: This part is particularly interesting:
32: 114688
64: 114688
128: 114688
That would imply a full 14680064 bytes of heap space available, which seems extremely unlikely. Can you print out the distance between addresses, or take an average, or something? Even dump them to a file. I want to get to the bottom of this; I suspect it's going to turn out to be something obvious.
How is it 14680064 bytes of heap space? The second column is just total number of bytes malloc'ed.
graphmastur wrote:
How is it 14680064 bytes of heap space? The second column is just total number of bytes malloc'ed.
Ohhh, I was reading the second column as the number of chunks allocated. This is why I need to not work on this sort of thing so late at night. I got myself quite confused there. That means that 1 byte, 2 bytes, and 4 bytes are all consistent that there is some fixed amount of overhead, 16384 chunks being allocated, and 4 bytes per chunk of contents with or without some padding. From 8 and upward to 128, things sort of make sense; beyond that I believe things have gone squiffy.
So I did a little more research, and when allocating 4 bytes, the difference between pointer locations are always 4 bytes. The same thing seems to happen with 1 and 2.
When I tried 128, though, it surprised me because there was 132 bytes from pointer to pointer. The 128 bytes allocated, plus 4 more. The same thing happens with 8, 16, 256, etc.
For chunks of 512, the difference in pointers is 1024. For 1024, it's 2048. I just don't know anymore...
I did an experiment, where I started allocating in chunks of 1024, and then I halved it each time malloc returned 0. It could get a total of 129280 bytes of memory. Which doesn't seem right at all. Well, it is consistent with 2^7KB-256B.
I am starting to wonder if the 1024/2048 is a page issue: 1024 bytes for the data, and 4 bytes -> 1024 bytes for the header. Can you try 2048-4 = 2044 (and maybe 2040) bytes and see what you get?
With 2044, I get 2048 as expected. With 2040, I get 2044, but total memory allocated is 130560 bytes.
Also, looking at the addresses, either casio implemented some form of a GC somehow, or this is probably paging. Any other tests?
I've already made a test about that. (
Download it)
This prog expect a number N and allocate N buffers the largest possible.
Fun fact :
First ask 1 buffer, it can't make it larger than 65528 B
Ask 100 buffers, then re-ask 1 buffer, and now it can allocate 131056 B
On Prizm you can allocate 131060 bytes max.
PierrotL: Yeah, that seems to be what Graphmastur discovered as well. I'm quite baffled at the algorithm that Casio is using, and I'd love to see a disassembly of it. Ironically, in the undergrad CS class I TA the students need to implement malloc, and I think all their implementations are better than this.
KermMartian wrote:
PierrotL: Yeah, that seems to be what Graphmastur discovered as well. I'm quite baffled at the algorithm that Casio is using, and I'd love to see a disassembly of it. Ironically, in the undergrad CS class I TA the students need to implement malloc, and I think all their implementations are better than this.
Maybe someone should email them and ask about it?
So I wanted to post this yesterday, but couldn't. This is what I discovered when I tested with 2040 bytes chunks. This is the listing of addresses for each malloc call:
881E000B
881E1807
881E100B
881E2807
881E200B
881E3807
881E300B
881E4807
<Removed a few>
881FF00B
As you can see, it follows an interesting pattern. I didn't know what the pattern was, though, so I double checked. Starting at 881E00B, it jumps forward 6140 (0x17FC) bytes, and then it jumps back 2044 (0x7FC) bytes.
So that's definitely peculiar. I find it interesting that the last one is 881FF00B. It leads me to believe that we have the space between 881E0000 to 881FFFFF to use. That's a total of 131,071 bytes. Given PierrotLL's max size of 131,060 that's 11 bytes difference, which makes sense starting with 881E000B.
So, I guess that's our limit, 0x1FFF4 bytes is the max we can allocate without paging or something else.
EDIT: Now, we can probably figure out their algorithm from this info.
graphmastur wrote:
So I wanted to post this yesterday, but couldn't. This is what I discovered when I tested with 2040 bytes chunks. This is the listing of addresses for each malloc call:
881E000B
881E1807
881E100B
881E2807
881E200B
881E3807
881E300B
881E4807
<Removed a few>
881FF00B
Does the sh3 processor (and/or the RAM controller in the Prizm) have any particular word alignment restrictions when accessing, eg, two- and four-byte values? If so, then that malloc is broken (all of the addresses are odd). AFAIU, malloc is supposed to return a pointer that is suitably aligned for any data object, usually at least even alignment if not a bigger alignment boundary. If the sh3 *can* access words at odd addresses, then it's probably not an issue.
The Prizm does indeed crash if you try to dereference a pointer to an integer that is not aligned on a 4-byte boundary, for instance, so I agree that this implementation seems a bit broken...
KermMartian wrote:
The Prizm does indeed crash if you try to dereference a pointer to an integer that is not aligned on a 4-byte boundary, for instance, so I agree that this implementation seems a bit broken...
Trying to allocate a size of 131060 gave me a pointer to 0x881E000C. Which means it goes to 0x88200000. Trying to allocate 0x1FFF5 or 131061 bytes gave me.... Absolutely nothing. So yes, 0x1FFF4 is the largest size possible.
Wait, I'm an idiot... I forgot to add one. Yep, definitely an idiot. Her's the corrected list of addresses:
881E000C
881E1808
881E100C
881E2808
881E200C
881E3808
881E300C
881E4808
<Removed a few>
881FF00C
Whew, that makes a lot more sense, thanks for the correction. The allocation algorithm is still baffling, though.