Endless Locale Bugs

This isn’t the first time I’ve ranted about Locale related issues. It probaly won’t be the last, either.

Currently my biggest gripe is with convenience methods in Java which rely on platform defaults, and are thus platform-specific. For instance, the venerable System.out.println() will terminate your line with whatever your platform thinks a line ending should be. Printing dates or numbers will try to use your platform Locale. Writing a string to a file will default to platform encoding.

Some of these defaults can be controlled at run time, others require JVM startup options. This is all horribly wrong. This results in all kinds of unexpected behaviour. It’s error-prone. None of this should be allowed to happen. Any method that assumes platform-specific magic should be deprecated. Which is exactly what I’ll do, as soon as I can figure out how to write a Sonar plugin to detect all this nonsense.

Character sets, time zones and hashes

Character sets, time zones and password hashes are pretty much the bane of my life. Whenever something breaks in a particularly spectacular fashion, you can be sure that one of those three is, in some way, responsible. Apparently the average software developer Just Doesn’t Get It™. Granted, they are pretty complex topics. I’m not expecting anyone to care about the difference between ISO-8859-15 and ISO-8859-1, know about UTC‘s subtleties or be able to implement SHA-1 using a ball of twine.

What I do expect, is for sensible folk to follow these very simple guidelines. They will make your (and everyone else’s) life substantially easier.

Use UTF-8..

Always. No exceptions. Configure your text editors to default to UTF-8. Make sure everyone on your team does the same. And while you’re at it, configure the editor to use UNIX-style line-endings (newline, without useless carriage returns).

..or not

Make sure you document the cases where you can’t use UTF-8. Write down and remember which encoding you are using, and why. Remember that iconv is your friend.

Store dates with time zone information

Always. No exceptions. A date/time is entirely meaningless unless you know which time zone it’s in. Store the time zone. If you’re using some kind of retarded age-old RDBMS which doesn’t support date/time fields with TZ data, then you can either store your dates as a string, or store the TZ in an extra column. I repeat: a date is meaningless without a time zone.

While I’m on the subject: store dates in a format described by ISO 8601, ending with a Z to designate UTC (Zulu). No fancy pansy nonsense with the first 3 letters of the English name of the month. All you need is ISO 8601.

Bonus tip: always store dates in UTC. Make the conversion to the user time zone only when presenting times to a user.

Don’t rely on platform defaults

You want your code to be cross-platform, right? So don’t rely on platform defaults. Be explicit about which time zone/encoding/language/.. you’re using or expecting.

Use bcrypt

Don’t try to roll your own password hashing mechanism. It’ll suck and it’ll be broken beyond repair. Instead, use bcrypt or PBKDF2. They’re designed to be slow, which will make brute-force attacks less likely to be successful. Implementations are available for most sensible programming environments.

If you have some kind of roll-your-own fetish, then at least use an HMAC.

Problem be gone

Keeping these simple guidelines in mind will prevent entire ranges of bugs from being introduced into your code base. Total cost of implementation: zilch. Benefit: fewer headdesk incidents.