password cracking

Glenn · Expert (Posts: 559)

I'm actually using a MySQL database to store this so I'm not sure how well that would work but while I had not heard of that before that is similar to how I was planning to speed up searches, grouping all hashes that share the same first x characters into an encoded piece of data that would then be searched or maybe having ~256 tables for each hash, one to store each 2 character beginning to the hash for faster searching. Thanks for the suggestion though, I had not encountered those before by name, only be concept and mostly for organization, not storage efficiency.

Glenn · Expert (Posts: 559)

No, Why? I actually just noticed I had taken out the indexes temporarily for figuring out a MySQL error and never put them back, I was only going to have one on the hash column though as that is the one I will be searching with to find the associated plaintext.

elfprince13 · OVER NINE THOUSAND! (Posts: 11872)

Ah, yeah, that makes sense. I wasn't think when I said that. I'm not sure how you could mix the plaintext space-saving of a radix tree with your database method.

Glenn · Expert (Posts: 559)

It is already 20 or 40GB depending on how you count it, I record each hash twice to keep very small lookup tables (one per hash type) along with one to keep more metadata and both hashes. I have an 85 character charset for this and so these tables will be getting very large. I think the bigger issue is the processing the very large sql queries into the right format to store inside of the mysql daemon. I'm currently doing about 28k passwords a second though so I don't think I'm doing to poorly in terms of speed. Will MySQL use more than one core effectively? I could try loading up more processing processes but MySQL is basically using a full core right now. I'm currently guessing I'm about 0.033% done with 5 and 6 character passwords and I'm done with all of them up to 4 characters.

Glenn · Expert (Posts: 559)

I find that idea interesting but my rough calculations say that would take up more space, the average space used for directories across the hashes plus the size of the simlink would have to be <125 bytes or better <70 bytes as if you include my table with metadata there is 250 bytes of data per set of one md5 and one sha1 hash of which half is for the metadata table which isn't really needed except to give me more information on how my system is working. My understanding is that a symlink would have to take up at least one sector of the harddrive or 512 bytes, much more than I use this way. I might be able to save space though by creating files 3 levels deep, each one containing all hashes and their passwords starting with those 3 letters and having each one be bzip2 compressed, for that number of files to be fast I could store them first in the mysql database then pull all of the ones destined for the first file and store/compress them, then delete them from the DB, then continue onto the second file, etc. That along with getting rid of the metadata would likely result in a 50-90% reduction in space usage but would take a lot more effort to generate and could very easily be slower at retrieval as I would have to decompress and read the entire file back to find which line contained the needed hash though I could stop early if I found it early. I have a day or two to figure out the initial things to do as there is at least 500GB of storage left and my first step will probably be regular pruning of the metadata table. My other first step will be to clean out the drive of the hundreds of GB of data I generated testing other things. (This is one a software RAID 5 array of 3 1.8TB partitions so it can hold a LOT of data.) The next thing I might do is remove the >1TB of temporary data I was using to try to speed up my online backup that keeps messing up and just backup directly to the internet instead of caching the backup on the drive and streaming it up later. In short I have a fair bit of time until I need to really worry about space. At the moment I have ~350M passwords in the DB totaling ~80GB. I'm about 0.07% finished with my current set of passwords (5 and 6 character) as of 1-2 hours ago.

Glenn · Expert (Posts: 559)

Yeah, I can see that idea, the idea I had was basically what you are describing, and I don't have indexes yet, so I don't know how much space that will be. I like my version of the radix style tree (though I would not be using it for space saving) but I am having a hard time figuring out how many levels deep I should go before going to a file as the more levels there are the longer it takes to get them from the initial DB location to the files but the faster it is to search the set of files, does anyone know a way to figure that out? I have a calculator for the number of entries I will have but I don't have a good idea how many levels deep to go. I'm thinking somewhere between 4 and 8, leaning towards 4, three levels of folders each representing one character of the hash and one level of files for the 4th character. Does that sound really off to anyone? I could then compress each file to save space and it would be fairly quick to search as there would only be a small subset of the hashes there, I'm concerned that that would be to many files for getting them in and to few hashes per file limiting compression. I'm thinking I can leave the hash creation process the same as that is far faster than using the files and then have another process moving them after they are created and deleting the source. One big concern I have is how to structure the file to keep it safe from misreads but very quick to read. I'm inclined to use a comma space to separate the hash (first) from the password (second) and then a newline to separate hash/password groups. I have plenty of inodes free (238,826,293 as of writing to be exact). Thoughts??

Thanks,
Glenn

elfprince13 · OVER NINE THOUSAND! (Posts: 11872)

It's a multivariable optimization problem depending on your hardware configuration. I did something like it in O/S last semester when calculating the optimum number of bits to use in a paging scheme or something like that.

Glenn · Expert (Posts: 559)

It's a multivariable optimization problem depending on your hardware configuration. I did something like it in O/S last semester when calculating the optimum number of bits to use in a paging scheme or something like that.

KermMartian · Site Admin (Posts: 64055)

Oooh, I heartily second the GPU-based suggestion. I love GPGPU code. Smile

Glenn · Expert (Posts: 559)

I'm actually bound by both I believe though I can't be certain yet, for generation and first sort of the hashes I am certainly CPU bound but that is <10% of the time in my estimate, the second sorting is most of the time and I think that is IO bound at least partially. My main issue with threading is that I did not really write my system to be thread safe in its operation due to time limitations, I think I won't have the time to do what I really need which is a rewrite, I wrote the code in PHP originally because I was rewriting half the code several times a day to work out how to do this quickly and efficiently and rewriting that much C code did not look like a fun task or an easy one as well as I have never done hashing in C and still haven't gotten around to learning C++. I hope to write a version in C at some point but that may be a little off. I may try doing compiled PHP though as that may speed things up a bit. Threading would be another option but there is only one threadsafe portion of the code and that is the part that takes the least time (probably <3%). It is already fairly fast though in <6 hours (probably more like 3-4) I generated all numeric passwords up to 8 characters long. I plan on finishing all numeric passwords up to 10 characters long then work on lower alpha numeric passwords. The one other part of my code that is partially thread safe, the second sort, is very disk intensive and so I intentionally didn't thread it to avoid to much seeking to try to speed it up as it was having speed issues when I threaded it more than without. The second sorting writes between 5 and 10 MB per second to disk unthreaded and so I think threading it will likely cause issues with the disk. I like the ideas I just don't think I have time to implement them yet. I also think I'm at or near the limit of where more RAM improves performance.