Yeah, my source code isn't as well structured and organized as it could be. It started as just a private experiment that I hacked on a lot, and a lot of it wasn't written with the idea of ultimately making the code public in mind.
At the core of it, as you know, is a custom Markov implementation, and that is fed by information in a database which is generated by a separate script that parses my IRC logs. There is a separate table of Markov chain data for each person (“personality”), which basically is generated by filtering out just that person's lines on IRC and processing them in the appropriate way to generate Markov chains. The Markov-specific code is in markov.py, though there's a lot of PostgreSQL-specific stuff hardcoded into it. (In hindsight, it would have been better if I had abstracted the DB-backend-specific stuff into a separate layer of code so that the engine could be adapted more easily to use some other DB backend, or another storage method entirely.)
The heart of the engine at the top level, though, is the tables in the “patterns” directory. I tried to organize those into separate files by subject matter, though it still feels rather messy to me (especially since all the stuff that should be used by all personalities is all dumped into global.py anyway, while everything that should be used only by the Nikky/NikkyBot persona is in the other files). The tables are basically a large list of regular expressions that are matched against the line someone says to it. If it finds a match, it uses the commands in that table to tell it how to form a response. (If it doesn't, then I believe I have it just generate some sort of “generic”, randomized Markov response.) This response could be a hard-coded prewritten response, a randomly-chosen prewritten response, a Markov chain generated from a template of seed words, or any combination of those. These tables, with the various functions I can use in them, are what allow me to “shape” the response depending on what someone has said to it. These tables are probably the most interesting part of the code to look at. If you know a bit of regex, you might be able to get a rough idea of how some of it works.
The process works more or less the same for the different “personalities”, the main difference merely being the source of the Markov data (that particular person's own past lines in chat).
nikkyai.py contains the code that ties all the pattern table and Markov stuff together. Some of it gets pretty crazy, and I'm not sure if I even remember exactly how all of it works anymore.
There's a lot of little features crammed into it, including the bot trying to avoid generating the exact same response too often, attempts to match its output to the context of the conversation (which I've never been able to tell whether it actually works all that well), etc., which adds a fair bit of complexity.
Finally, nikkybot.py contains the code that interfaces the internal chat engine to IRC, making it an IRC bot. Essentially, it implements the NikkyBot IRC “client”.