I have now turned my attention to getting a better tradeoff in terms of storage and I have a rough solution that I am getting ready to test, let me know if any of this sounds wacky. I have already made code to compress the files with bzip2 and then uncompress them to find the hash. That has been tested on a small scale and worked, the next space improvement I am doing which will only be used in the next generated set files due is in two parts which both occur before the compression which I am leaving in. The first is a partial radix tree where as I have made a tree down to the fourth character for other reasons I cut off the first 4 characters from the hash (I can easily recreate them from the name of the file and the folder hierarchy it is in). The second part is removing the last 8 characters of the MD5 hash and the last 12 of the SHA1 hash and storing only the part before them. The second part may mean that there are multiple hits on a single hash where there should not be but that is dealt with a cracking time as because the plaintext is there I can rehash the matching plaintexts, which are only a very small subset of the full set of plaintexts, and compare them to the hash being cracked to determine the validity of the plaintext stored. This will mean very slightly longer hash sorting times and cracking time but my estimate puts it at about 28% space reduction before compression and I'd estimate an about 50% space reduction applied to the output of that from compression. All in all I think that this will be a very good tradeoff to decrease storage needs but not significantly decrease performance. Let me know what you think of this idea.
Makes sense to me =)
elfprince13 wrote:
Makes sense to me =)


Good, I have now run some tests on cracking the compressed data and for my smaller dataset (the larger one is still going through the initial compression) of about 110,000,000 hashes I could crack a hash in < 1 second. I have found that the compression takes a very long time though, more than 24 hours on my larger dataset, I estimate 48-72 hours for the 250GB large in progress set which is every 0-8 character hash and ~20 percent of 9 and 10 character hashes. I am glad with the compression though as I appear to be getting better than 50% compression ratios as in compressing between half and 9/16th of the data I have gone from 25% utilization of the drive I dedicated to this to ~12%. I think I am getting over 75% compression ratio, probably more like 90% not including the cutting of characters from the file that I am doing in the later version (it did not make it into the version generating the large dataset so it is not affecting these data sizes). This will drastically improve the amount of hashes that can be stored in the same space with my system at the cost of some speed, I will try to improve the compression speeds though as in this version it will keep getting longer more than linearly in the current system though I have a few goods ideas for fixing that.

Thanks for the support and ideas everyone.

EDIT: With more tests I'm getting numbers that vary quite a bit in speed for cracking, up to 10 seconds for the same dataset but I believe it to be due to system and harddrive load more than anything else as I have a lot more running not as compared to my initial tests.

EDIT2: I have fixed the compression speed issue as far as I expected possible but it may reduce the compression efficiency if I run compression in the new method to often.

EDIT3: I just realized I calculated my number of hashes in the set I am testing cracking speed against very incorrectly, I was mixing two very different sets of figures that were incompatible and it was off by more than one order of magnitude, it was more like 1,332,000,000 hashes (~1.3 billion) that I was testing against.
As this is for a science fair (well partly, I had started the project before I decided to put it into the science fair and will likely use it for several other things some of which are for school requirements) I am starting to put together a board for it and am working to figure out what all statistics I should use on it and what data to put there. At least 95% of the people there have no idea what a hash even is so I don't want to go overly technical but I do want some tech parts. I have so far figured on the following possible stats for the board (only a subset of which will be used):

rough number of hashes generated per second

rough time to crack a password from a certain number of hashes

real rough estimate of time to generate equivalent rainbow table

real rough estimate of time to crack 1 hash from equivalent rainbow table

rough time to generate a given table of mine

real rough time to generate equivalent rainbowtable

rough storage for one of my tables

real rough storage for 99%+ crack certainty equivalent rainbow table

rough numbers of passwords in given character sets and lengths

any other ideas? Which of these do you guys think is most likely to be good for a relatively low tech crowd and which are most useful in general?

Thanks,
Glenn
I feel like raw speed numbers are the most impressive to a non-technical crowd, such as "seconds per password" (or even "passwords per second!", but if that's less than one, that's not great to report). I feel like people passing by in a science fair might understand that there's two phases, rainbow-table generation and actual password cracking, at a rough glance, but they're probably not going to care that much about the rainbow table generation. Just my two cents.
Last year I made up some sweet posters for programming I did for my research project, and my adviser suggested that for a non-technical crowd you can get more glamor-points by using a picture of some vaguely-technical looking syntax highlighted code instead of a multi-paragraph description of the actual mathematics involved. Depending on your audience you can take that under advisement, or ignore it as you see fit (this vs this if you care to take a peek).
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 2 of 2
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement