Fallout Password Hacking Solving Aid made in Python 3.X - Cemetech | Forum

Just finished this little script that turned out to be an interesting enough project that I figured some members here might want to see.

For those unaware, Fallout 3, New Vegas, and 4 all features a mini-game/puzzle for "hacking" into computers whose password you haven't yet found or cannot be found. The puzzle is pretty simple, you've been given a set of possible words all with the same length, and must make guesses as to which is the correct password. You get 4 attempts at the correct password, and the game will tell you after each attempt how many characters in your guess matched the characters in the correct password. In this sense, it's similar to the game Mastermind except get the "correct letters with correct placement" number without the "correct letter with incorrect placement" number. If you make 4 incorrect guesses you're locked out of the machine.

You have as much time as you want to solve the puzzle, so generally the way most people solve it is by making a random choice, seeing how many letters match, then finding and trying all remaining passwords that have the same number of matching letters. I grew tired of manually going through each password and counting letters, and disliked how random and unguided this process was so I started making a script that would attempt to find the password for me with as few guesses as possible.

When looking at this problem, my first insight (from experience) was noting that not all passwords given were equal. The game at times seems to use words with certain prefixes and suffixes. 're-', 'un-', 'de-', etc. are frequent prefixes above a certain length, along with '-ing', '-tion', '-ed', etc. for word endings and suffixes. I have also noticed that at times it will choose a lot of words that share the same prefix and suffix. Generally speaking choosing one of these words as your first bit of information tends to be not very helpful as most of the time the actual password is one of these words, so it's going to have 3-6 letters in common with about half of your given words. While reducing the possibilities by 50% can be pretty good, because we're dealing with 0-13 letter matches here, I'd like to aim for a better reduction than 50%. While we can technically gain more information with each guess at a password, I think some of the choices are usually going to be much better than others, which begs the questions of "which choices are better than others?" and "how can that be determined?"

To attempt to figure this out, I started looking at a practical example. Below is a list of the words I had available as options in a Hard rated computer. You'll note that although they don't share many prefixes, several of them share the same suffixes and endings due to many of them being verbs that share the same tense.

Code:

DESTRUCTION


EXAMINATION


RECOGNITION


MISTRUSTING


RESIDENTIAL


INFORMATION


DISCOVERIES


TEMPORARILY


RADIOACTIVE


ACCOMPANIED


THREATENING


ELECTRONICS


RESEARCHING


CHALLENGING


CONTROLLING


OBSERVATION


PERSONALITY


NIGHTVISION


PROGRAMMING


INFLUENTIAL

The only information I can gleam from the game about any of these choices is how many letters any of them have in common with the actual password, which could be any of them. Due to this, my first thought is to try to try matching each word with one another and see if there's anything I can try with the resulting data. Below is a chart with how many letters match between each word. The Y axis is labeled with each word being used, the X axis uses the same words in the same order going from left to right as the Y axis uses going from top to bottom.

Code:

DESTRUCTION: 11,  4,  5,  6,  4,  5,  3,  2,  3,  1,  1,  1,  4,  1,  3,  6,  2,  3,  2,  2


EXAMINATION:  4, 11,  5,  2,  2,  5,  1,  2,  2,  2,  1,  2,  1,  2,  1,  5,  3,  3,  1,  2


RECOGNITION:  5,  5, 11,  2,  4,  5,  1,  2,  3,  3,  1,  1,  3,  1,  1,  4,  3,  4,  1,  2


MISTRUSTING:  6,  2,  2, 11,  3,  3,  3,  1,  2,  1,  3,  1,  4,  3,  5,  4,  1,  2,  4,  2


RESIDENTIAL:  4,  2,  4,  3, 11,  2,  2,  2,  4,  1,  1,  1,  4,  3,  1,  3,  2,  1,  1,  6


INFORMATION:  5,  5,  5,  3,  2, 11,  1,  2,  2,  3,  1,  1,  1,  1,  2,  6,  2,  3,  2,  5


DISCOVERIES:  3,  1,  1,  3,  2,  1, 11,  3,  2,  2,  2,  3,  2,  1,  1,  3,  2,  3,  1,  1


TEMPORARILY:  2,  2,  2,  1,  2,  2,  3, 11,  2,  2,  2,  2,  3,  1,  1,  2,  5,  1,  1,  1


RADIOACTIVE:  3,  2,  3,  2,  4,  2,  2,  2, 11,  1,  1,  1,  3,  1,  1,  2,  2,  1,  2,  2


ACCOMPANIED:  1,  2,  3,  1,  1,  3,  2,  2,  1, 11,  2,  2,  1,  1,  1,  2,  2,  1,  1,  1


THREATENING:  1,  1,  1,  3,  1,  1,  2,  2,  1,  2, 11,  2,  5,  4,  3,  2,  2,  1,  3,  1


ELECTRONICS:  1,  2,  1,  1,  1,  1,  3,  2,  1,  2,  2, 11,  2,  1,  1,  1,  1,  2,  1,  1


RESEARCHING:  4,  1,  3,  4,  4,  1,  2,  3,  3,  1,  5,  2, 11,  3,  3,  3,  2,  1,  3,  1


CHALLENGING:  1,  2,  1,  3,  3,  1,  1,  1,  1,  1,  4,  1,  3, 11,  4,  1,  1,  1,  3,  4


CONTROLLING:  3,  1,  1,  5,  1,  2,  1,  1,  1,  1,  3,  1,  3,  4, 11,  2,  2,  1,  4,  1


OBSERVATION:  6,  5,  4,  4,  3,  6,  3,  2,  2,  2,  2,  1,  3,  1,  2, 11,  2,  4,  2,  2


PERSONALITY:  2,  3,  3,  1,  2,  2,  2,  5,  2,  2,  2,  1,  2,  1,  2,  2, 11,  1,  2,  1


NIGHTVISION:  3,  3,  4,  2,  1,  3,  3,  1,  1,  1,  1,  2,  1,  1,  1,  4,  1, 11,  1,  1


PROGRAMMING:  2,  1,  1,  4,  1,  2,  1,  1,  2,  1,  3,  1,  3,  3,  4,  2,  2,  1, 11,  1


INFLUENTIAL:  2,  2,  2,  2,  6,  5,  1,  1,  2,  1,  1,  1,  1,  4,  1,  2,  1,  1,  1, 11

The code I used to present this data:
Code:

def count_matches(word1, word2):


    return sum([1 for l1, l2 in zip(word1, word2) if l1 == l2])





match_grid = [[count_matches(word1, word2) for word2 in words] for word1 in words]


print('\n'.join([ cur_word + ': ' + ', '.join([str(num).rjust(2, ' ') for num in match_row]) for cur_word, match_row in zip(words, match_grid) ]))

One thing I wanted to see when looking at this data was whether there was a significant difference in the potential information provided by each choice, which as the data currently stands is difficult to see, so I tried sorting each row by values to see some rows had a noticeably more even distribution of data than others. Admittedly this destroys some of the data, as it's now impossible to tell which value relates to which word, however as we're just looking at the metadata and we're just presenting it in a different way, I'm not super concerned with keeping that intact. This is also done purely so I can better interpret the data and isn't used in the resulting code at any point.

Code:

DESTRUCTION:  1,  1,  1,  1,  2,  2,  2,  2,  3,  3,  3,  3,  4,  4,  4,  5,  5,  6,  6, 11


EXAMINATION:  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  4,  5,  5,  5, 11


RECOGNITION:  1,  1,  1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  3,  4,  4,  4,  5,  5,  5, 11


MISTRUSTING:  1,  1,  1,  1,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  4,  4,  4,  5,  6, 11


RESIDENTIAL:  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  3,  3,  3,  4,  4,  4,  4,  6, 11


INFORMATION:  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  3,  3,  3,  5,  5,  5,  5,  6, 11


DISCOVERIES:  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3, 11


TEMPORARILY:  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  5, 11


RADIOACTIVE:  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  4, 11


ACCOMPANIED:  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  3,  3, 11


THREATENING:  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  3,  3,  3,  4,  5, 11


ELECTRONICS:  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  3, 11


RESEARCHING:  1,  1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  3,  3,  3,  3,  4,  4,  4,  5, 11


CHALLENGING:  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  3,  3,  3,  3,  4,  4,  4, 11


CONTROLLING:  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  4,  4,  5, 11


OBSERVATION:  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  4,  4,  4,  5,  6,  6, 11


PERSONALITY:  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  5, 11


NIGHTVISION:  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  3,  3,  3,  3,  4,  4, 11


PROGRAMMING:  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  3,  3,  3,  4,  4, 11


INFLUENTIAL:  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  4,  5,  6, 11

The code used for this is the last line of the previous code segment slightly altered to sort the numbers before joining them together with commas.
Code:

print('\n'.join([ cur_word + ': ' + ', '.join(sorted([str(num).rjust(2, ' ') for num in match_row])) for cur_word, match_row in zip(words, match_grid) ]))

One thing I noticed immediately is that there are several lines here that have more than 10 matches with only 1 letter in common. It seems while I was largely concerned with getting several words with 3-6 letters in common, the bigger issue (in this example, at least) is with words that are too unique from one another to get much information. Now this information is a bit easier to read, but I'm still (personally) struggling to get a better idea of which has the better distribution of values. In order to get a better idea of this, I decided to count the number of each value. Here's the resulting grid:

Code:

DESTRUCTION: 0: 0, 1: 4, 2: 4, 3: 4, 4: 3, 5: 2, 6: 2, 7: 0, 8: 0, 9: 0, 10: 0


EXAMINATION: 0: 0, 1: 5, 2: 8, 3: 2, 4: 1, 5: 3, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


RECOGNITION: 0: 0, 1: 6, 2: 3, 3: 4, 4: 3, 5: 3, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


MISTRUSTING: 0: 0, 1: 4, 2: 5, 3: 5, 4: 3, 5: 1, 6: 1, 7: 0, 8: 0, 9: 0, 10: 0


RESIDENTIAL: 0: 0, 1: 6, 2: 5, 3: 3, 4: 4, 5: 0, 6: 1, 7: 0, 8: 0, 9: 0, 10: 0


INFORMATION: 0: 0, 1: 5, 2: 6, 3: 3, 4: 0, 5: 4, 6: 1, 7: 0, 8: 0, 9: 0, 10: 0


DISCOVERIES: 0: 0, 1: 7, 2: 6, 3: 6, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


TEMPORARILY: 0: 0, 1: 6, 2:10, 3: 2, 4: 0, 5: 1, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


RADIOACTIVE: 0: 0, 1: 6, 2: 9, 3: 3, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


ACCOMPANIED: 0: 0, 1:10, 2: 7, 3: 2, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


THREATENING: 0: 0, 1: 8, 2: 6, 3: 3, 4: 1, 5: 1, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


ELECTRONICS: 0: 0, 1:12, 2: 6, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


RESEARCHING: 0: 0, 1: 5, 2: 3, 3: 7, 4: 3, 5: 1, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


CHALLENGING: 0: 0, 1:11, 2: 1, 3: 4, 4: 3, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


CONTROLLING: 0: 0, 1:10, 2: 3, 3: 3, 4: 2, 5: 1, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


OBSERVATION: 0: 0, 1: 2, 2: 8, 3: 3, 4: 3, 5: 1, 6: 2, 7: 0, 8: 0, 9: 0, 10: 0


PERSONALITY: 0: 0, 1: 5, 2:11, 3: 2, 4: 0, 5: 1, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


NIGHTVISION: 0: 0, 1:11, 2: 2, 3: 4, 4: 2, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


PROGRAMMING: 0: 0, 1: 9, 2: 5, 3: 3, 4: 2, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0


INFLUENTIAL: 0: 0, 1:10, 2: 6, 3: 0, 4: 1, 5: 1, 6: 1, 7: 0, 8: 0, 9: 0, 10: 0

And the code used to present this information:
Code:

print('\n'.join([word + ': ' + ', '.join([ str(i) + ':' + str(match_row.count(n)).rjust(2, ' ') for n in range(word_len)]) for word, match_row in zip(words, match_grid)]))

Side note: You may notice that despite the fact that each word is 11 characters in length, 11 is missing from this grid. Unless there are 2 words entered that are exactly the same, there will always be exactly 1 word in the list that is the same as a given word: itself. Therefore this last column can be safely omitted without losing any useful information.

The information here seems a lot easier to quantify than before. Around this time, I started thinking about ways to find which row had the best distribution of values, when I realized that the biggest issue with any of the rows is whichever number of shared letters is most common in a given row. Minimizing this number should be the best way to guarantee that no matter what the actual password is, the information received will narrow down where the actual password is the most consistently. As such, the resulting code looks like this:
Code:

nmatches = [[ match_row.count(i) for i in range(word_len)] for match_row in match_grid]


max_nmatches = [max(nmatch_row) for nmatch_row in nmatches]


minmax_index = max_nmatches.index(min(max_nmatches))

After this, we can attempt to reiterate the same process for the second most common values for any rows that share the minimum of the maximum of each row, but in practice one iteration of this process is more than enough to solve most any situation presented by the hacking mini-game in Fallout 3, New Vegas, or 4.

After this, all we need to do is get the result for the word the algorithm believes is the best option, choose that in-game, enter however many letters the game says that chosen word shares with the actual password, remove all the words that don't share that many letters as the chosen one, then reiterate until the actual password is discovered.

In practice I've found that this process tends to narrow the results by 80% each iteration, depending on the length of the words, generally requiring 2-3 attempts to find the actual password. I've yet to run into a situation where it has required a 4th attempt, but I still need to do some more testing to see how effective it is at different lengths and with different numbers of given passwords, and am looking to see if I can find the way the game sets this puzzle up in more precise details so I can simulate it much faster than manually typing words when I find new computers to hack into in-game.

Last but not least, the full source code for my resulting script:
Code:

def count_matches( word1, word2 ):


    return sum([ 1 for l1, l2 in zip(word1, word2) if l1 == l2 ])





while True:


    print("Please input all words, then an empty line once complete.")


    


    li = [' ']


    while li[-1]!='':


        li.append( input(": ").upper())


    


    words = li[1:-1]


    if max([ len(e) for e in words ]) != min ([ len(e) for e in words ]):


        print("All words need to be of the same length.")


        continue


    else:


        word_len = len( words[0] )


    


    cur_words = words


    while len(cur_words) > 1:


        # Attempts to find most evenly distributed values of matches by finding which word has the lowest maximum of n matches


        match_grid = [[ count_matches( word1, word2 ) for word2 in cur_words] for word1 in cur_words ] # Check how many letters each word has in common with every other word currently being considered


        nmatches = [[ match_row.count( n ) for n in range( word_len )] for match_row in match_grid ] # Counting how many other words have n-matching letters for each possible chosen word


        max_nmatches = [ max( nmatch_row ) for nmatch_row in nmatches ] # Maximum n-match across each row


        minmax_index = max_nmatches.index( min( max_nmatches )) # Index of the row with the minimum of these values


        


        print("'%s' seems to be the most informative choice." % cur_words[minmax_index])


        while True:


            print("\nHow many letters matched?")


            i_str = input(': ')


            try:


                letters_matched = int(i_str)


            except ValueError:


                if i_str == '': # If inputted int was just empty string, assume the previously suggested word was correct and reset


                    break


                continue


            


            cur_indexes = [ i for i, e in enumerate(match_grid[minmax_index]) if e == letters_matched]


            if len(cur_indexes) == 0:


                print("No given words appear to match the number entered.")


                continue


            


            cur_words = [ cur_words[i] for i in cur_indexes ]


            if len(cur_indexes) == 1:


                print("Exact match found: '%s'" % (cur_words[0]))


            else:


                print("%s words being considered: %s" % (len(cur_indexes), ', '.join(cur_words)))


            break


        


        print('\n')


        if i_str == '':


            break

Let me know how you'd approach this problem, how you liked my approach, or if you can find any specific details on how Fallout generates these puzzles so I can quickly test more situations and see how well this works in most scenarios.