Gillius's Programming

Multithreaded Networking in a Game Engine

Prelude

The following is a post that I made recently, explaining what I am thinking is the biggest problem that would hold back adoption of GNE into a game. Initally back when GNE started I wanted to experiment with a new type of interface and with threading, but how was that choice?

I invite any discussion of this on the forums.

The Problem

Threads typically sacrifice latency for throughput, so usually adding threads makes things less "real-time" when looked at purely in the theoreticall sense. However, this view assumes that the program has nothing better to do than to sit around and wait for an event or to be able to check constantly for something.

In that sense, premptive threading is extremely useful for addressing latency issues. If you have a high-priority thread blocked on an event (like waiting for a packet), then you can respond very quickly by preempting the other code. The other code can become easier to write if the tasks are independent, because you don't have to think about inserting hooks into different modules of the program, or be concerned about keeping all operations small to make sure everything gets attention in some loop.

GNE uses threading and parallelism very heavily to enable parallelism in game networking engines. However, there is a drawback here that I've never really have been able to fully address and it does threaten to make GNE somewhat pointless, and that is concurrent game state updates.

If you receive a new position for a player, and your game is currently in the logic loop, you probably don't want to update the player's info at this time, because you are in the middle of calculating logic, and changing stuff might invalidate some results (like collision detection most likely). If you are in the middle of a render loop, you don't want stuff to move around either, so that some of the objects are in the new position and some are in the "old". Also, if you somehow found a way to make concurrent updates "OK", locking granularity is going to be a problem -- having a mutex for every game object is not going to scale well or perform well at all and be certainly worse off than a single-threaded solution.

This all means that the only reasonable time to actually process the network messages is between the render stage and the next logic stage. Well if you only allow one point for processing, you might as well put all of the network code in there, right?

Well, there is still a lot of work like actually managing the sockets and copying data and transferring data and doing the low-level stuff that can be done truly in parallel. But games are typically low-bandwidth and the amount of time the CPU spends on the actual network part of the game is probably pretty low, so even if you had a dual-core chip it wouldn't offer too much. Maybe it would be more useful in the HT situtation, but it would hardly exercise the dual-core.

There is still some logic benefit to having threads, especially to be able to use blocking send and recv, that makes the code a lot clearer, more modular, and easier to write, but doesn't really do much for performance.

I still haven't thought of a solution for this. The closest I've thought of is to work with a "snapshot mentality" and start processing the next frame while rendering occurs. Since GPUs are a separate entity from the CPU, the rendering stage just needs to do the minimum to keep the GPU busy, and while the CPU is waiting for the render to complete it could be calculating the next logic frame. But in order to do that, you have to either have two copies of the game state, or have a separate "render-only" representation of the game -- this is not utterly far-fetched since now already we need to use things like vertex arrays and shaders and storing matrix info, all of which are completely unsuitable for game logic processing but are in a sense a copy of the game's state from the standpoint of rendering. The problem with snapshots is that the extra memory and time required to manage that memory may exceed the benefit derived from threaded programming.