I recently announced that I added support for Casio Prizm pictures (.g3p files) to Cemetech's SourceCoder 3 online calculator programming IDE. The hardest part of creating that new feature was not the code that implements it in SourceCoder, but the reverse-enginering work necessary to understand how to read .g3p files and then generate new .g3p files that the Casio fx-CG10 and fx-CG20 will both accept. At the request of several Cemetech members, I have decided to write a short tutorial showing how I reverse-engineered the .g3p format, which I hope will help you with any new file or data format that you might want to try to understand. The tutorial will be roughly divided into sections explaining what you should have to successfully understand a new format, what existing information will accelerate the process, and how to actually peer into the unknown format.
What You Need
Any good tutorial should tell you the prerequisites before you dive in. In this case, you need examples to work from and tools to help you examine the examples and test your hypotheses. For this particular project, I used these tools and data sources:
What You Know
You usually need to have some idea of what you're looking at before you begin, and the more information you have, the easier the reverse-engineering process will be. Although not vital, you should generally start by knowing what the file type you're examining contains. If you have a way of viewing its contents elsewhere (in this case, by displaying the pictures on the Prizm), then you know what the file should decode to. You can also use information about similar files and the platform itself to give you additional clues. I worked from some of these clues:
Reverse-Engineering the .g3p Format
As with any puzzle, reverse-engineering a file format is a process of using the clues and pieces in front of you to build up a progressively larger picture of the data you're examining. For me, this meant first understanding the header, then finding where the image data was stored in the file, then understanding how the data was stored and signed. As with most puzzles, I found myself following red herring clues to incorrect conclusions. I made compromises: in writing code to generate new images, I chose one of the variations on the .g3p format that loads on all fx-CG10 and fx-CG20 calculators but does not require generating a file footer. Without further ado, let me walk you through the process of decoding the header, footer, and body of .g3p picture files. For the sake of brevity, I will be eliding some of the false starts and tedious computation I performed, but I would be happy to clarify any step in the attached discussion thread.
1. Understanding the .g3p Header
I began decoding the .g3p header in a vacuum as an exercise, later cross-referencing my discoveries against my exploration of the .g3m program header to verify its correctness. For your own examination, here are two .g3p headers and one .g3m header:
Next we have a set of unusual bytes that change from file to file. They seem to fall into a number of groups, so I have highlighted them in several difference colors below:
Code:
Yes, those worked out very neatly! The actual reverse-engineering progress was much more trial-and-error. I omitted all the attempts I made with incorrect combinations of bytes, like using the LSW or the LSB elsewhere, trying other operations besides addition and subtraction, or not using mod-0x100 (mod-256) math. I also omitted my attempts to use the two bytes at 0x1C as a single word rather than two separate bytes. I left in my mistaken attempts to use the least-significant word (LSW) as a whole word to deduce the process of finding the byte at 0x1C, so you can see how that dead-end led me to consider summing the two bytes of the LSW and mixing that with the byte at 0x1C. So what do these tables tell us? By flipping around the operations that yield the same constants, we now know how to compute the security bytes at offsets 0x0E, 0x14, 0x1C, and 0x1D:
2. Data and Metadata
Next, we need to find where in the file the image data starts, and what metadata surrounds it. This information will enable us to reliably decode any possible .g3p file we encounter as well as generate new .g3p files that the Prizm will reliably accept. As just mentioned, we will be looking at one specific "Casio Provided" variation of the .g3p format. It's a good one to examine, because it has no special footer fields, and is accepted by both the fx-CG10 and fx-CG20 Prizm variants. There are more complicated CP0100 and CAPTURE formats, but I'll omit those from this reverse-engineering walkthrough. Let's take a look at the next 192 (0xC0) bytes of the first file we've been looking at, the one that is about 45KB long. It contains a few scattered bytes, a long section of empty space, then what looks like some metadata followed by image data. I'm getting ahead of myself; here's the data:
3. Decompressing the Data
Early in this tutorial, I mentioned that I was using Python to prototype a script to pull apart the .g3p files I was examining and later put them back together. In fact, as I went through the process I have been describing up to here, I was continually expanding that program to display the values of each of the fields I have mentioned, and to emit errors if any of the values, including security and size fields, were not what I expected. Although it may not be what Casio originally intended, I started to view the format as something like an onion, with layers of size ints followed by data nested inside each other. As my Python program grew, it pulled apart each successive layer, ending with the nugget of what I assumed was compressed image data at the end. Once I got here, though, I was stuck. What could I do next? The obvious solution would be to figure out what decompression algorithm was in use. My biggest fear was that either a proprietary protocol was in use, that the compressed data was also wrapped in a layer of encryption (perhaps unlocked by a secret hidden deep in the Prizm firmware), or even both. Unfortunately, preliminary inspection revealed no clues that would help me crack this nut.
A bit of non-hexadecimal sleuthing provided the next clue. I found notes from other intrepid reverse-engineering explorers, who had actually gotten nearly as far as me in understanding the headers and metadata that comprise the beginning of the .g3p format. They uncovered a clue that proved vital: a copyright line in OS documentation referring to the DEFLATE algorithm developed in the 1990s and commonly used as a lightweight but effective compression algorithm. Since it didn't appear that the Prizm OS compressed any other data, it seemed logical to assume that the image data inside the .g3p format was compressed with the DEFLATE algorithm. Unfortunately, feeding the data starting at 0xD2 to the INFLATE algorithm that complements DEFLATE did nothing. I wrote an O(nlog(n)) program that tried cutting off bytes at the beginning and end of the data to no avail; INFLATE still refused to recognize the data as valid. After a few hours of experimentation, I grudgingly accepted that some layer of obfuscation must be applied. I first applied the obvious bit inversion (flipping every bit) that was used for the size integer at offset 0x10; this proved equally fruitless. I then tried inverting only some bits, then later flipping bytes, rotating bytes, or mixing bytes by exchanging groups of bits. By perseverence and my Python program performing exhaustive permutations on the bit mixing and inversion, I succeeded in discovering the key: cutting off the last four bytes of the data (presumably some checksum?) and performing the following steps:
.%76543210 === decode ==> ~%21076543
.%bits.... <== encode === .%bits.... In other words, to decode each obfuscated byte, use bits 7-3 as bits 4-0, and bits 2 to 0 as bits 7-5. Then, invert all bits. This yields a chunk of data that can be successfully decompressed by the INFLATE algorithm. The resulting decompressed data contains two pixels per byte for 3-bit-color images and two bytes per pixel for 16-bit-color images. However, remember those final four bytes we snipped off? In order to create new .g3p files of our own, we need to understand what that checksum actually is, or the Prizm will not display the images.
4. Cracking the Checksum
The checksum required more manual experimentation to understand, but in the end the solution was a very simple one. I started by extracting the stored checksum on the data bytes in each of the files I was examining, and adding that to a table including the data length, data type ID. I then tried summing the data in other interesting ways: (1) was the sum of the compressed but unobfuscated bytes in the data section only, (2) included the metadata as well, and (3) used the inverted, obfuscated bytes. I also added another column with the unmixing and inversion process applied to the checksum in case it was obfuscated along with the data. My table at the end of it looked like this:
Code:
It might not be obvious from a first glance, but nothing matched or was even close. I particularly noted that all of the checksums I calculated were relatively small, especially for the short files, whereas the values of the original checksums did not seem correlated to the size of the file. I presumed that a more cumulative sum was in use, one that added the sum of the current checksum and the new byte to the sum on each iteration. Searching for cumulative checksums yielded a checksum commonly used with DEFLATE called an Adler32 checksum. The Adler32 checksum computes a normal summed checksum and a cumulative checksum, and concatenates their bits to form the final checksum. For the .g3p format, this checksum is computed over the raw, uncompressed data, and the checksum is appended before the obfuscation step is performed (and thus is itself obfuscated).
Conclusion
Reverse-engineering the .g3p format was time-consuming but fun, and I learned about a new compression algorithm and a new checksumming technique along the way. I will shortly be releasing the full, more technical description of the different .g3p file formats. In the meantime, I hope this tutorial helped you learn a bit more about the techniques, tools, and experimentation inherent in reverse-engineering a format. As always, questions or comments in the attached topic are encouraged.
What You Need
Any good tutorial should tell you the prerequisites before you dive in. In this case, you need examples to work from and tools to help you examine the examples and test your hypotheses. For this particular project, I used these tools and data sources:
- As many examples of the file format you're examining as possible. One of these should be a "minimum" instance, in this case a completely blank white image captured on my Prizm and copied to my computer. I also collected a number of 3-bit and 16-bit .g3p images from the community, as well as a collection of different 16-bit .g3p files provided by Casio. Having this variety ensures that you can adequately pick out the constant bytes in the header, body, and footer of the format (if it has those components), as well as discover the source of any unknown data.
- A scripting language like Python that will let you quickly apply hypotheses to your existing files as well as generate new files. For the .g3p format, this let me first verify that the .g3* file header worked the way I believed, and later that the de-obfuscated and decompressed image data contained the expected pixels. It also let me modify the image data in existing .g3p files on the fly to see if the resulting image displayed on my Casio Prizm looked as I expected.
- The device or program that creates and uses the file format, so that once you get as far as making new files you can test that they are correctly formatted. Your new files must be recognized as valid by the device or program that opens them, and the data you include in the files must be decodable. For this project, the Prizm had to recognize the picture files as valid pictures, and also display their contents properly. The former does not guarantee the latter: I discovered that if a certain checksum was incorrect, the calculator would recgonize the file as valid but would display a blank image.
- A hex editor and a hexadecimal-capable calculator. For me, this was XVI32 and Windows Calculator, respectively. Other Cemetech members swear by the hex editor HxD.
What You Know
You usually need to have some idea of what you're looking at before you begin, and the more information you have, the easier the reverse-engineering process will be. Although not vital, you should generally start by knowing what the file type you're examining contains. If you have a way of viewing its contents elsewhere (in this case, by displaying the pictures on the Prizm), then you know what the file should decode to. You can also use information about similar files and the platform itself to give you additional clues. I worked from some of these clues:
- The Prizm's SH3/4 processor is a big-endian 32-bit CPU. Therefore, it was likely that size words in the file would be big-endian 32-bit integers. This was further supported by my previous explorations of the .g3m program format.
- Exploring the .g3m program format had given me some experience with what I believed was a common header format on all .g3* files, which turned out to be mostly an accurate belief. Had I investigated existing documentation on the .g1m program format used for the fx9750 and fx9860 calculators, I would have found that it provided similar clues about the 32-byte header on .g3p files.
- I knew that all of the .g3p files I was working from contained 3-bit or 16-bit pictures, all 384 pixels wide by 192 pixels tall.
- The smallest of the files was far smaller thn (384*192)*(3/8) bytes, the smallest number of bytes in which an uncompressed 3-bit-color 384x192-pixel image could be stored. I also noticed that the more complex image files were larger. Therefore, I deduced that some form of compression was being used.
Reverse-Engineering the .g3p Format
As with any puzzle, reverse-engineering a file format is a process of using the clues and pieces in front of you to build up a progressively larger picture of the data you're examining. For me, this meant first understanding the header, then finding where the image data was stored in the file, then understanding how the data was stored and signed. As with most puzzles, I found myself following red herring clues to incorrect conclusions. I made compromises: in writing code to generate new images, I chose one of the variations on the .g3p format that loads on all fx-CG10 and fx-CG20 calculators but does not require generating a file footer. Without further ado, let me walk you through the process of decoding the header, footer, and body of .g3p picture files. For the sake of brevity, I will be eliding some of the false starts and tedious computation I performed, but I would be happy to clarify any step in the attached discussion thread.
1. Understanding the .g3p Header
I began decoding the .g3p header in a vacuum as an exercise, later cross-referencing my discoveries against my exploration of the .g3m program header to verify its correctness. For your own examination, here are two .g3p headers and one .g3m header:
Quote:
Beach.g3p:
0x00: AA AC BD AF 90 88 9A 8D 82 FF EF FF EF FF DA FE
0x10: FF FF 52 1B 63 00 00 00 00 00 00 00 08 A0 00 00
Pic04.g3p:
0x00: AA AC BD AF 90 88 9A 8D 82 FF EF FF EF FF 28 FE
0x10: FF FF FE 69 B1 00 00 00 00 00 00 00 02 EE 00 00
RPG1.g3m:
0x00: AA AC BD AF 90 88 9A 8D 8A FF EF FF EF FF 42 FE
0x10: FF FF F8 83 CB 01 00 00 00 00 00 00 1E 08 FF FE
It seems that every Prizm file begins with the same 8-bit sequence 0xAA, 0xAC, 0xBD, 0xAF, 0x90, 0x88, 0x9A, 0x8D. In examining other Prizm files, this pattern held true. Next, the byte at offset 0x08 seems to give some indication of the file type. Indeed, in investigating other files, all .g3m programs had 0x8A at that offset, and all .g3p pictures had 0x82 there. Incidentally, .g1m program and picture files happen to have the same 8 header bytes followed by 0xCE.
0x00: AA AC BD AF 90 88 9A 8D 82 FF EF FF EF FF DA FE
0x10: FF FF 52 1B 63 00 00 00 00 00 00 00 08 A0 00 00
Pic04.g3p:
0x00: AA AC BD AF 90 88 9A 8D 82 FF EF FF EF FF 28 FE
0x10: FF FF FE 69 B1 00 00 00 00 00 00 00 02 EE 00 00
RPG1.g3m:
0x00: AA AC BD AF 90 88 9A 8D 8A FF EF FF EF FF 42 FE
0x10: FF FF F8 83 CB 01 00 00 00 00 00 00 1E 08 FF FE
Next we have a set of unusual bytes that change from file to file. They seem to fall into a number of groups, so I have highlighted them in several difference colors below:
Quote:
Beach.g3p:
0x00: AA AC BD AF 90 88 9A 8D 82 FF EF FF EF FF DA FE
0x10: FF FF 52 1B 63 00 00 00 00 00 00 00 08 A0 00 00
Pic04.g3p:
0x00: AA AC BD AF 90 88 9A 8D 82 FF EF FF EF FF 28 FE
0x10: FF FF FE 69 B1 00 00 00 00 00 00 00 02 EE 00 00
RPG1.g3m:
0x00: AA AC BD AF 90 88 9A 8D 8A FF EF FF EF FF 42 FE
0x10: FF FF F8 83 CB 01 00 00 00 00 00 00 1E 08 FF FE
I started out with the assumption that the colored bytes, all of which vary from file to file, were "security" bytes based on the size of the file. This turned out to be correct, but if it had been wrong, other possibilities would have been some checksum over the entire file contents, the size of the data portion of the file, or a checksum over the data portion. This assumption was supported by the 4-byte value at offset 0x10 in every file. The sizes of these three files happen to be 44516 bytes (0xADE4 bytes), 406 bytes (0x196 bytes), and 1916 bytes (0x77C bytes), respectively. If you represent each of those hex sizes as a 32-bit big-endian integer and invert every bit, you get 0xFFFF521B, 0xFFFFFE69, and 0xFFFFF883. With that big hint that other fields in the header are related to the file size, let's build a table comparing them to the full inverted size int, the lower two bytes of the size, and the lowest byte of the size. I have added a few additional files for further comparison (note that all values are hex; 0x is omitted for brevity): 0x00: AA AC BD AF 90 88 9A 8D 82 FF EF FF EF FF DA FE
0x10: FF FF 52 1B 63 00 00 00 00 00 00 00 08 A0 00 00
Pic04.g3p:
0x00: AA AC BD AF 90 88 9A 8D 82 FF EF FF EF FF 28 FE
0x10: FF FF FE 69 B1 00 00 00 00 00 00 00 02 EE 00 00
RPG1.g3m:
0x00: AA AC BD AF 90 88 9A 8D 8A FF EF FF EF FF 42 FE
0x10: FF FF F8 83 CB 01 00 00 00 00 00 00 1E 08 FF FE
Code:
+-------------+----------+-----+------+-------------+------+-------------+--------+-------------+
| Size Int | Size LSW | LSB | 0x0E | *0E-LSB%100 | 0x14 | *14-LSB%100 | 1C, 1D | *1D-LSB%100 |
+-------------+----------+-----+------+-------------+------+-------------+--------+-------------+
| FF FF 52 1B | 52 1B | 1B | DA | DA-1B = BF | 63 | 63-1B = 48 | 08 A0 | A0-1B = 85 |
| FF FF FE 69 | FE 69 | 69 | 28 | 28-69 = BF | B1 | B1-69 = 48 | 02 EE | EE-69 = 85 |
| FF FF F8 83 | F8 83 | 83 | 42 | 42-83 = BF | CB | CB-83 = 48 | 1E 08 | 08-83 = 85 |
| FF FF F8 78 | F8 78 | 78 | 37 | 37-78 = BF | C0 | C0-78 = 48 | 0B FD | FD-78 = 85 |
| FF FF 8C B2 | 8C B2 | B2 | 71 | 71-B2 = BF | FA | FA-B2 = 48 | D9 37 | 37-B2 = 85 |
+-------------+----------+-----+------+-------------+------+-------------+--------+-------------+
+-------------+----------+-----+--------+-------------+--------------+-----------+--------------+
| Size Int | Size LSW | LSB | 1C, 1D | *1D-LSB%100 | *1C-LSW%100 | LSW sum | *1C-LSWS%100 |
+-------------+----------+-----+--------+-------------+--------------+-----------+--------------+
| FF FF 52 1B | 52 1B | 1B | 08 A0 | A0-1B = 85 | 08-521B=ADED | 52+1B=06D | 08-06D = 9B |
| FF FF FE 69 | FE 69 | 69 | 02 EE | EE-69 = 85 | 02-FE69=0199 | FE+69=167 | 02-167 = 9B |
| FF FF F8 83 | F8 83 | 83 | 1E 08 | 08-83 = 85 | 1E-F883=079B | F8+83=17B | 1E-17B = A3 |
| FF FF F8 78 | F8 78 | 78 | 0B FD | FD-78 = 85 | 0B-F878=0893 | F8+78=170 | 0B-170 = 9B |
| FF FF 8C B2 | 8C B2 | B2 | D9 37 | 37-B2 = 85 | D9-8CB2=7427 | 8C+B2=13E | D9-13E = 9B |
+-------------+----------+-----+--------+-------------+--------------+-----------+--------------+
- *0x0E = (LSB of inverted size) + 0x85 % 0x100 = *0x13 + 0x85 % 0x100
- *0x14 = (LSB of inverted size) + 0x48 % 0x100 = *0x13 + 0x48 % 0x100
- *0x1C = (sum of LSW bytes of inverted size) + 0x9B % 0x100 = *0x12 + *0x13 + 0x9B % 0x100
- *0x1D = (LSB of inverted size) + 0x85 % 0x100 = *0x13 + 0x85 % 0x100
2. Data and Metadata
Next, we need to find where in the file the image data starts, and what metadata surrounds it. This information will enable us to reliably decode any possible .g3p file we encounter as well as generate new .g3p files that the Prizm will reliably accept. As just mentioned, we will be looking at one specific "Casio Provided" variation of the .g3p format. It's a good one to examine, because it has no special footer fields, and is accepted by both the fx-CG10 and fx-CG20 Prizm variants. There are more complicated CP0100 and CAPTURE formats, but I'll omit those from this reverse-engineering walkthrough. Let's take a look at the next 192 (0xC0) bytes of the first file we've been looking at, the one that is about 45KB long. It contains a few scattered bytes, a long section of empty space, then what looks like some metadata followed by image data. I'm getting ahead of myself; here's the data:
Quote:
0x20: 43 50 00 01 00 00 00 00 00 00 00 00 00 00 00 00
0x30: 00 00 AD C4 00 00 00 01 00 00 AD 2C 00 00 00 00
0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x50-0xA0: ... 00 ...
0xB0: 00 00 00 00 00 00 00 00 00 01 00 00 00 00 AD 18
0xC0: 00 00 01 80 00 C0 00 10 01 00 00 00 00 00 AD 14
0xD0: 3C 1B 98 1C A7 45 27 C0 32 8A 8E 3F 5E 3E 8B 56
Further investigation into other variants of the .g3pformats revealed that all that empty space in the middle is related to the extra footers that can be tacked onto the end of some variants. Since we won't be discussing those types, we can ignore that blank space and assume those are always zeroes. From examining several files, there are several constants between offsets 0x20 and 0xBC: 0x30: 00 00 AD C4 00 00 00 01 00 00 AD 2C 00 00 00 00
0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x50-0xA0: ... 00 ...
0xB0: 00 00 00 00 00 00 00 00 00 01 00 00 00 00 AD 18
0xC0: 00 00 01 80 00 C0 00 10 01 00 00 00 00 00 AD 14
0xD0: 3C 1B 98 1C A7 45 27 C0 32 8A 8E 3F 5E 3E 8B 56
- 0x20: Always contains the six-byte sequence 0x43, 0x50, 0x00, 0x01, 0x00, 0x00.
- 0x34: Always contains the 32-bit big-endian integer 0x00000001.
- 0xB8: Always contains the four-byte sequence 0x00, 0x01, 0x00, 0x00.
- 0x30: File size - 0x20 (Size of file after initial header)
- 0x38: File size - 0xB8 (0xB8=184 bytes; this value is the size of the image data plus 0x18 bytes of metadata/header at offset 0xB8)
- 0xBC: File size - 0xCC (this value is the size of the image data plus its 4-byte size field)
- 0xCC: Image data + 0x04 (part of the image data header)
- 0xC0: (Word) Always 0x00, 0x00. Function unknown.
0xC2: (Word) 0x01, 0x80 (Decimal 384); width of image in pixels.
0xC4: (Word) 0x00, 0xC0 (Decimal 192); height of image in pixels.
0xC6: (Word) 0x00, 0x10 (Decimal 16); bit width. Can be 3 or 16.
0xC8: (Word) Always 0x01, 0x00. Function unknown.
0xCA: (Word) Always 0x00, 0x00. Function unknown.
0xCC: (Int) Size of image data, as mentioned above
0xD0: (Word) Part of image data; 2-byte ID. Always 3C1B for "Casio Provided" images with no footers.
3. Decompressing the Data
Early in this tutorial, I mentioned that I was using Python to prototype a script to pull apart the .g3p files I was examining and later put them back together. In fact, as I went through the process I have been describing up to here, I was continually expanding that program to display the values of each of the fields I have mentioned, and to emit errors if any of the values, including security and size fields, were not what I expected. Although it may not be what Casio originally intended, I started to view the format as something like an onion, with layers of size ints followed by data nested inside each other. As my Python program grew, it pulled apart each successive layer, ending with the nugget of what I assumed was compressed image data at the end. Once I got here, though, I was stuck. What could I do next? The obvious solution would be to figure out what decompression algorithm was in use. My biggest fear was that either a proprietary protocol was in use, that the compressed data was also wrapped in a layer of encryption (perhaps unlocked by a secret hidden deep in the Prizm firmware), or even both. Unfortunately, preliminary inspection revealed no clues that would help me crack this nut.
A bit of non-hexadecimal sleuthing provided the next clue. I found notes from other intrepid reverse-engineering explorers, who had actually gotten nearly as far as me in understanding the headers and metadata that comprise the beginning of the .g3p format. They uncovered a clue that proved vital: a copyright line in OS documentation referring to the DEFLATE algorithm developed in the 1990s and commonly used as a lightweight but effective compression algorithm. Since it didn't appear that the Prizm OS compressed any other data, it seemed logical to assume that the image data inside the .g3p format was compressed with the DEFLATE algorithm. Unfortunately, feeding the data starting at 0xD2 to the INFLATE algorithm that complements DEFLATE did nothing. I wrote an O(nlog(n)) program that tried cutting off bytes at the beginning and end of the data to no avail; INFLATE still refused to recognize the data as valid. After a few hours of experimentation, I grudgingly accepted that some layer of obfuscation must be applied. I first applied the obvious bit inversion (flipping every bit) that was used for the size integer at offset 0x10; this proved equally fruitless. I then tried inverting only some bits, then later flipping bytes, rotating bytes, or mixing bytes by exchanging groups of bits. By perseverence and my Python program performing exhaustive permutations on the bit mixing and inversion, I succeeded in discovering the key: cutting off the last four bytes of the data (presumably some checksum?) and performing the following steps:
Quote:
.%76543210 === decode ==> ~%21076543
.%bits.... <== encode === .%bits....
4. Cracking the Checksum
The checksum required more manual experimentation to understand, but in the end the solution was a very simple one. I started by extracting the stored checksum on the data bytes in each of the files I was examining, and adding that to a table including the data length, data type ID. I then tried summing the data in other interesting ways: (1) was the sum of the compressed but unobfuscated bytes in the data section only, (2) included the metadata as well, and (3) used the inverted, obfuscated bytes. I also added another column with the unmixing and inversion process applied to the checksum in case it was obfuscated along with the data. My table at the end of it looked like this:
Code:
Filename type Data length Checksum Inv CS sum1 sum2 sum3 Unmixed CS
Pict04.g3p (3) 3E93 00 00 00 3A 04 DF 60 01 FB 20 9F FE 00 00 06 55 00 00 05 3E 00 42 F0 00 7F 04 F3 DF
Pict01.g3p (3) 3E93 00 00 06 2B 69 EA 96 FA 96 15 69 05 00 03 4D 38 00 03 4C 38 00 3D 2A 0C D2 A2 2D A0
Pict02.g3p (3) 3E93 00 00 0E 29 B8 4E 4C 50 47 B1 B3 AF 00 07 96 4D 00 07 95 33 00 3D 73 61 E8 36 76 F5
Books.g3p (16) 3E93 00 01 70 F5 E3 6C F4 79 1C 93 0B 86 00 BE 43 DF 00 BE 42 C9 01 6C 4C 7B 83 72 61 D0
Bowl~.g3p (16) 3E93 00 00 08 A2 15 83 0D B3 00 47 26 B6 00 47 26 A6 01 9F 46 37 5D 8F 5E 89
Beach.g3p (16) 3C1B 00 00 AD 14 85 6A D4 D5 7A 95 2B 2A 00 58 9D 59 00 58 9C 59 01 C3 4A D7 4F B2 65 45
Brid~.g3p (16) 3C1B 00 01 56 1A 31 E9 7F 0B 00 B0 57 2A 00 B0 55 AD 00 E7 03 14 D9 C2 10 9E
Conclusion
Reverse-engineering the .g3p format was time-consuming but fun, and I learned about a new compression algorithm and a new checksumming technique along the way. I will shortly be releasing the full, more technical description of the different .g3p file formats. In the meantime, I hope this tutorial helped you learn a bit more about the techniques, tools, and experimentation inherent in reverse-engineering a format. As always, questions or comments in the attached topic are encouraged.