Are there any stipulations on the use of file streams and/or other variables in Python between concurrent threads?

For example, if I open a file stream in a main thread to a log file:

Code:

# These are methods of Class "Server".
def __init__ (self):
                ....
                self.mlogf = open("logs/messages.txt","a+")
      self.elogf = open("logs/errors.txt","a+")
      self.ilogf = open("logs/log.txt","a+")

def log(self,*args,**kwargs):
      print(*args,**kwargs)
      for arg in args:
         self.ilogf.write(str(arg)+" ")
      self.ilogf.write("\n")
   
   def elog(self,*args,**kwargs):
      self.log(*args,**kwargs)
      for arg in args:
         self.elogf.write(str(arg)+" ")
      self.elogf.write("\n")


Can I call Server.log or Server.elog within another thread? I am having issues in the TI-Trek server, where log files aren't being written to and I'm wondering if this is the culprit.
You will have issues with interleaving if you write from multiple threads. However, if you use the logging module, it will be thread safe to do this.
Ok, so essentially, ya'll can correct me if I'm wrong here:

within `handle_connection`, loading and writing to files within the user's folder should be thread safe because each user gets their own thread which will only ever load files in that directory?

Is opening a file read-only thread safe?

Calls to `self.log` (which is an alias of `Server.log`) from threads other than the opening thread can fail?

Any other attempt to write to a file that may be used in another thread can fail for the same reason?

Any other file that is only used by one thread should be safe?


And one final question: https://github.com/acagliano/titrek-server/blob/master/server.py
Do we handle multithreading properly here? I mean as of now, the server IS 100% stable and working as we intend, but that doesn't mean it's done right.
No, operations will not fail. What might happen is that you'll see unexpected results.

For instance, two threads A and B each want to write "Hello world!" to the same file. All of these are possible results:
  • Hello world!Hello world!
  • HHeelloo wwoorrlldd!!
  • HelloHello world! world!
  • ...
You don't get any guarantees about atomicity of operations, but that's the only issue in play.
So then what is the safest way to do a write to a file that needs to be available to any thread, where the order (or at least the sanity of what's written) matters?

Also, if multi-writing a file won't cause the .write to outright fail, what is causing the log files to not write? Calls to `self.log` print their respective error, text, or exception to the Console just fine but do not write to the log file.
EDIT: I just took a look at the code you posted on GitHub, but unless I'm overlooking something, it looks like you're using the multiprocessing module to create entirely new processes, rather than Python threads? If so, that is a completely different kettle of fish, and what I wrote below about locks doesn't apply at all (though the flushing still does).



If you close the file, or call the .flush() method on it, does the expected output show up then? Depending on the system, the first bits of written data may not actually become visible in the file until the buffer fills up. You can call the file's flush() method any time you want the buffered writes to actually be flushed to the log early, for instance, after each line or chunk of lines written.

For thread-safe file writing in Python, I think threading.Lock would work in the simplest case. You'd have a lock for the file, and any thread that needs to write to it would grab the lock, write whatever lines that it doesn't want interrupted in the middle by another thread, then release the lock. In Python, it would look something like this:


Code:
# Somewhere in init code
import threading
log_file_lock = threading.Lock()

# Make sure every thread that needs to write to the file is passed the
# log_file_lock instance; in the example below we assume it's stored in
# self.log_file_lock

# “with” statement grabs the lock then releases it after the “with” part
# completes
with self.log_file_lock:
    log_file.write('blah blah\n')
    # Write any additional lines that should always appear together, if any
    log_file.flush()    # optional


If you're writing to more than one file in threads, you should use one lock object per file unless you're sure that every file will *always* be used together at the same time in every thread. However, every *thread* should share the same lock instances; don't create a new lock object for each thread.

The way it works is, if one thread tries to get the lock while another has it, the lock will make it wait until the first thread is done with it. Then the second thread will proceed, keeping written data from the threads from being written to the file in an intermingled fashion.
Travis wrote:
EDIT: I just took a look at the code you posted on GitHub, but unless I'm overlooking something, it looks like you're using the multiprocessing module to create entirely new processes, rather than Python threads? If so, that is a completely different kettle of fish, and what I wrote below about locks doesn't apply at all (though the flushing still does).

Is it wrong to use the multiprocessing module in that way? Will it matter that I'm using multiprocessing instead of threads? Should threads be used instead?
multiprocessing is not the same as threads, and that probably explains your issues- you're using independent processes, not threads and thus state is not shared in the same way.

Under the hood if you have threads sharing a file handle they all look the same to the OS. When multiple processes have a file open they have separate handles with their own seek locations and such- having a file open for writing from multiple processes is usually a mistake (unless you're being very careful), and many systems will outright prevent it. Opening a file in append-only mode will on some systems work as you expect, but not all- and asking Python to use mode a+ is probably not actually an append-only mode under the hood. Things start getting really weird when thinking about what Python is doing- for instance Linux guarantees that open files refer to the same handle on both sides of a fork(), which I think means that your files will be closed if any of the launched subprocesses close them and it's going to be difficult to reason about when that might happen. Suffice to say, you need more or better structure for this.

Using the logging module for this might sidestep such issues easily, but locking (with multiprocessing.Lock) as Travis described might be sufficient. You could also run a single logging thread/process/whatever that receives messages from other components and is the only one responsible for writing to log files.

Welcome to the world of concurrency, I guess.
Quote:

You could also run a single logging thread/process/whatever that receives messages from other components and is the only one responsible for writing to log files.

This seems simplest, imo. How would it be done?
I have another question.
In the `handle_connection` threads (per user) the entire thing is a loop of `while self.server.online` which is an alias of `Server.online`.
When the main thread exits (we issue a `stop` to the server), the code sets `Server.online` to false. Will this cause the other threads to all stop as a result?
ACagliano wrote:
In the `handle_connection` threads (per user) the entire thing is a loop of `while self.server.online` which is an alias of `Server.online`.
When the main thread exits (we issue a `stop` to the server), the code sets `Server.online` to false. Will this cause the other threads to all stop as a result?
As long as you're running threads and not processes this will be fine, since every thread has a reference to the same Server.

Quote:
Quote:
You could also run a single logging thread/process/whatever that receives messages from other components and is the only one responsible for writing to log files.

This seems simplest, imo. How would it be done?
I disagree. The logging module can give you a much easier-to-follow setup:

Code:
class Server:
    log = logging.getLogger('titrek.server')

    def doSomething(self):
        self.log.info('There are %s lights', 4)

if __name__ == '__main__':
    # Write errors to another file in addition to the main log
    error_handler = logging.FileHandler('logs/errors.txt')
    error_handler.addFilter(lambda record: record.levelno >= logging.ERROR)
    logging.basicConfig(handlers=(
        logging.FileHandler('logs/log.txt'),
        error_handler,
    ))

    Server().run()


In the most obvious way, the roll-your own alternative involves creating multiple Queues and pushing messages through them to another thread that then dequeues messages and writes to a file according to which queue it was.
Ok let me verify that I've done this properly:

In the class __init__ for Server

Code:

class Server:
   def __init__(self):
      for directory in ["logs","space/data","players","terrain","cache","missions","notes","bans"]:
         try:
            os.makedirs(directory)
         except:
            pass
      self.logger = logging.getLogger('titrek.server')
      self.loadbans()
      self.banlist = open("bans/userban.txt","a+")
      self.ipbanlist = open("bans/ipban.txt","a+")

      self.generator = Generator()
      self.space = Space(self.log)
      
      self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)         # Create a socket object
      self.sock.settimeout(None)
      self.port = PORT                # Reserve a port for your service.
      self.clients = {}
      self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
      self.sock.bind(('', self.port))                 # Now wait for client connection.

In the Server method `log`

Code:

def log(self,*args,**kwargs):
      print(*args,**kwargs)
      for arg in args:
         self.logger.info(str(arg)+" ", 4)

At the end of the script, where the Server is started

Code:

if __name__ == '__main__':
    # Write errors to another file in addition to the main log
    error_handler.addFilter(lambda record: record.levelno >= logging.ERROR)
    logging.basicConfig(handlers=(
        logging.FileHandler('logs/server.log'),
        error_handler,
    ))
server = Server()
server.run()

Did I do this correctly? Basically I want to pipe everything from debug messages to critical messages to both the console (terminal output) and to the log file.
Bump: https://github.com/acagliano/titrek-server/blob/master/server.py
Can someone verify that I implemented logging properly?
Look in Server.__init__, Server.log, and the very end of the script.
Seems like it works.
Quote:

Code:
if __name__ == '__main__':
    # Write errors to another file in addition to the main log
    logging.basicConfig(format='%(levelname)s:%(asctime)s %(message)s',filename='example.log',level=logging.DEBUG)
server = Server()
server.run()

The point of if __name__ == '__main__' is to avoid running code if you import the module rather than running it, so ideally you'd run the server in that block as well, as I did in the first place. And the changes you've made to the config make the comment incorrect.

Code:
if __name__ == '__main__':
    logging.basicConfig(
        format='%(levelname)s:%(asctime)s %(message)s',
        filename='example.log',
        level=logging.DEBUG
    )
    server = Server()
    server.run()
I edited as you indicated. What about Server.log(). Is there a better way to do that? Basically, i want to print anything we're logging, as well as writing it to a file.
Install two adapters on the root logger, one to write to the file and another to write to the console.
Code:
logging.basicConfig(format='...', level=..., handlers=[
    logging.StreamHandler(), # writes to stderr
    logging.FileHandler('somefile.log'),
])

logger.log() already supports formatting, so Server.log is best done just by passing parameters straight to the logger.

Code:
def log(self, *args, **kwargs):
    self.logger.log(*args, **kwargs)

# Logs "There are 4 lights"
self.log('There are %s lights', 4)

You should probably read the documentation for the logging module.
I have read it, but I didn't quite understand that particular part. Final question... Do I even need a
`def log()` function?

Can't I just use logger.info() or logger.debug() or logger.critical() or the built-in methods?
Do you imagine there would be any issue doing that?

Perhaps you should try it and report back with your findings.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are GMT - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement