Gillius's Programming

Major GNE Bug in Endian Handling

A major bug in GNE put HawkNL properly in little endian mode when using NO_NET network type, but any other type such as NL_IP called nlInit after nlEnable(NL_LITTLE_ENDIAN_DATA), which erased this setting. Amazingly, over all these years this error was not caught. I very seriously thought about changing all documentation and having GNE use big endian all the time, but decided to fix the bug and use little endian consistently.

Thanks goes out to Héber Costa Ferreira for finding this problem in GNE/HawkNL and providing suggestions, code, and testing effort to fix this issue.

This change has been commited into GNE SVN as of revision 680. If you are not using the SVN version of GNE, you should, it is much more stable than 0.70 and contains many critical fixes.

The reasons why I did this:

  1. The protocol specification, all documentation, and comments say little endian and I don't want to have to change them all
  2. It was little endian properly in NO_NET mode
  3. I guess client and server side will typically use same version of GNE, because they would be compiled together most of the time, so they both would change endianess at the same time.
  4. I still agree with my original thinking years ago that virtually all machines running GNE will likely be little endian. This is even more true today with the usage of x86 for Macs instead of PowerPC, and typical ARM is little endian. Therefore we can save effort by not swapping on these machines.

The counter arguments:

  1. Big endian is easier to read in a debugger or when dumping raw packets
  2. Big endian is "network byte order" and is typically assumed unless otherwise stated in Internet-based binary protocols.

I fully understand if someone prefers to use big endian over the wire. If you don't agree with this change, simply call nlEnable(NL_BIG_ENDIAN_DATA) immediately after initGNE. This workaround is safe for GNE versions both before and after this change. I will officially support bug requests if GNE doesn't work when you do this, of course at this time it is obvious that it works since that is the mode of operation until now.

Since the build version of the GNE connection packet is 16 bits in length, you can be safe in knowing that big-endian GNE will not connect to little-endian GNE due to a "protocol version mismatch", even though they have the same version. In a future version of the protocol I may add a special part of the packet to detect byte order specifically to output a better error message.

Note that the endian setting affects all Buffer operations (except strings and raw writing), which includes all GNE protocol code.