Character sets, time zones and hashes
Character sets, time zones and password hashes are pretty much the bane of my life. Whenever something breaks in a particularly spectacular fashion, you can be sure that one of those three is, in some way, responsible. Apparently the average software developer Just Doesn't Get It™. Granted, they are pretty complex topics. I'm not expecting anyone to care about the difference between ISO-8859-15 and ISO-8859-1, know about UTC's subtleties or be able to implement SHA-1 using a ball of twine.
What I do expect, is for sensible folk to follow these very simple guidelines. They will make your (and everyone else's) life substantially easier.
Use UTF-8..
Always. No exceptions. Configure your text editors to default to UTF-8. Make sure everyone on your team does the same. And while you're at it, configure the editor to use UNIX-style line-endings (newline, without useless carriage returns).
..or not
Make sure you document the cases where you can't use UTF-8. Write down and remember which encoding you are using, and why. Remember that iconv is your friend.
Store dates with time zone information
Always. No exceptions. A date/time is entirely meaningless unless you know which time zone it's in. Store the time zone. If you're using some kind of retarded age-old RDBMS which doesn't support date/time fields with TZ data, then you can either store your dates as a string, or store the TZ in an extra column. I repeat: a date is meaningless without a time zone.
While I'm on the subject: store dates in a format described by ISO 8601, ending with a Z to designate UTC (Zulu). No fancy pansy nonsense with the first 3 letters of the English name of the month. All you need is ISO 8601.
Bonus tip: always store dates in UTC. Make the conversion to the user time zone only when presenting times to a user.
Don't rely on platform defaults
You want your code to be cross-platform, right? So don't rely on platform defaults. Be explicit about which time zone/encoding/language/.. you're using or expecting.
Use bcrypt
Don't try to roll your own password hashing mechanism. It'll suck and it'll be broken beyond repair. Instead, use bcrypt or PBKDF2. They're designed to be slow, which will make brute-force attacks less likely to be successful. Implementations are available for most sensible programming environments.
If you have some kind of roll-your-own fetish, then at least use an HMAC.
Problem be gone
Keeping these simple guidelines in mind will prevent entire ranges of bugs from being introduced into your code base. Total cost of implementation: zilch. Benefit: fewer headdesk incidents.
On Bug Reports & the Urge To Decapitate
There's an old joke about a Manager, an Engineer and a Software Developer: they're in a car and the brakes fail as they go down a mountain road. They miraculously come to a standstill, but now they're stuck, and Somebody Should Do Something ™. The Manager suggests they make a Plan, define Goals & Measurable Objectives in order to solve the Critical Problem. The Engineer has his tools with him, so he suggests he take look at the problem and fix it. The Software Developer, of course, is not convinced and suggests they push the car up hill to see if the problem will manifest itself again ...
You can probably find a bunch of different versions of this on the web, but the punch line is always the same: the developer wants to be able to reproduce the bug. Not just to verify its existence (although it wouldn't be the first time a user reported a critical data loss bug after having pressed the delete button..), but also to have a place to start the bug hunt. Software systems are complex. We have enough layers of abstraction built on top of a bunch of transistors to make Shrek jealous. Something as seemingly simple as displaying a bit of text on a screen is so complex that it can no longer be fully understood by a single person.
Being able to reproduce a bug is the only way to resolve ambiguity. When it isn't possible to describe all the steps that led to the bug, then a thorough description of the problem is the next best thing. Think screenshots, explanations of the expected result, and answers to all of the usual "Which"-questions (Version, OS, Environment, Lunar Phase, ..).
Debugging is hard (and fun). Vague bug reports like "the application is broken" rather make me want to tie a team of horses to your limbs and decapitate your bloody remains educate you on the virtues of well-written bug reports.
Now .. let me see if I can fix this "broken" application .. sigh.
What bugs me on the web
2013 is nearly upon us, and the web has come a very long way in the ~15 years I've been a netizen. And yet, even though we've made so many advances, it sometimes feels like we've been stagnant, or worse, regressed in some cases.
Each and every web developer out there should have a long, hard think about how the web has (d)evolved in their lifetime and which way we want to head next. There's an awful lot happening at the moment: web 2.0, HTML 5, Flash's death-throes, super-mega-ultra tracking cookies, EU cookie regulation nonsense, microdata, cloud fun, ... I could go on all day. Needless to say: it's a mixed bunch.
In any event, here's a brief list of 3 things that bug me on the web.
Links are broken
Usability has long been the web's sore thumb, and in spite of any number of government-sponsored usability certification programmes over the year, people still don't seem to give a rat's arse. Websites are still riddled with nasty drop down menus that only work with a mouse. Sometimes they're extra nasty by virtue of being ajaxified. At least Flash menus are finally going the way of the dinosaur.
Pro tip: every single bloody link on your web site should have a working HREF, so people can use it without relying on click handlers, mice, javascript and so people can open the bloody thing in a new tab without going through hell and back.
Bonus points: make your links point to human-readable URLs.
Languages, you're doing it wrong
The web is no longer an English-only or US-only playing field, and companies all over are starting to cotton on to this fact. What they have yet to realise, however, is that people don't necessarily speak the language you think they do. If you rely on geolocation data to serve up translated content: stop. You're doing it wrong. The user determines the language. Believe it or not, people do know which language(s) they speak.
Geolocation, for starters, isn't an exact science. Depending on the kind of device this can indeed be very accurate. Or very much not. Proxies, VPNs, Onion Routers etc can obviously mislead your tracking. Geolocation tells you nothing. It doesn't tell you why that person is there (maybe they're on holiday?). It also doesn't tell you what language is spoken there. This might be a shock to some people, but some countries have more than one official language. Hell, some villages do. Maybe you can find this data somewhere, and correlate it with the location, but you'd be wrong to. Language is a very sensitive issue in some places. Get it right, or pick a sensible default and make clear that it was a guess. Don't be afraid to ask for user input.
Pro tip: My favourite HTTP header: Accept-Language. Every sensible browser sends this header with every request. In most cases, the default is the browser's or OS's language. Which is nearly always the user's first language, and when it's not, at least you know the user understands it well enough to be able to use a browser..
Bonus points: Seriously, use Accept-Language. If you don't, you're a dick.
Clutter is back
Remember how, back in 1999, we all thought Google looked awesome because it was so clean & crisp and didn't get in your face and everyone copied the trend? Well, that seems to have come to an end.
Here's Yahoo in 1997. (I love how it has an ad for 256mb of memory.)
Here's Yahoo now.
The 1997 version was annoying to use (remember screen resolutions in the 90s? No? You're too young to read this, go away) because it was so cluttered.
The 2012 version is worse and makes me want to gouge my eyes out.
Even Google is getting all in your face these days, with search-as-you-type and whatnot. Bah. DuckDuckGo seems to be the exception (at least as far as search engines go). It offers power without wagging it in your face.
Pro tip: don't put a bazillion things on your pages. Duh.
2013 Wishlist
My web-wishlist for 2013 is really quite simple: I want a usable web. Not just people with the latest and greatest javascript-enabled feast-your-eyes-on-this devices. For everyone. Including those who use text-to-speech, or the blind, or people on older devices. Graceful degradation is key to this. So please, when you come up with a grand feature, think about what we might be giving up on as well. Don't break links. Don't break the back button. Don't break the web.
Bad Press for Agile
So .. Agile's been getting some bad press of late. Now, these guys are just quacks, and I probably shouldn't feed the trolls here, but I never could resist.
Saying "agile doesn't work" or "agile is only out to sell services(training,certification etc)" is obviously a bogus claim. The same could be said of any software methodology. Many waterfall projects have failed, and many have had the help of process improvement engineers and whatnot. Some projects will always fail. A sound development methodology & culture can help you realise imminent failure earlier, or it can help reduce chances of failure. But no methodology is a guarantee for success. A team of idiots run by idiots will always produce crap. No matter how many buzzwords they fit in their job titles or marketing blurbs.
Agile is many things, but no one has ever claimed it to be a silver bullet. As for for it being "for lazy devs": all developers are lazy, it's part of the job description. It's why we automate shit. It's why we focus on code and not on hot air.
My recommendation to you: use whatever works for you. And in doing so, you're already on your way to being Agile
.
On name & address madness
A long time ago, in a university far far away, I had a crazy Relational Database professor. This man was under the impression that data should be normalized at any and all cost, to infinity and beyond. In some cases, he had a point. When he started talking about normalizing names and addresses, I had to suppress murderous tendencies. You see, he seemed to be under the impression that it's always a Good Idea to split an address into different chunks. You know the drill: street, post code, city, country. Or worse, add a house number. Or worse, states, counties & provinces. Maybe regions, too. And before you know it you end up with a monstrosity of a data model that's broken beyond repair. There are many different addressing systems out there. To the point that Wikipedia has a page devoted to them, some of which are too complex to explain on that single page. The Japanese one is particularly interesting. I'm sure someone out there is nuts enough to create a model that encompassess all of these many & varied possibilities, but not I.
Generally speaking, addresses are used to send snail mail. If that's the only reason you need them in your application, then a simple text string will do. I'm sure users from Country X are quite capable of entering their address in a sensible format. When people want stuff delivered, they'll make sure the address is properly formatted. If you need to do silly things like calculate shipping costs, then there's no harm in asking the user for their country seperately, or even their locality within that country. Instead of having a dozen fields, you'll have 2 or 3, which is much simpler to work with for users, and it won't drive developers insane.
And then there's names. They're even worse. "First name", "last name". If I had a penny for every time I came across a form asking me for two parts of my name, I'd be bleedin' rich. Let's start with the obvious: not everyone has two names. In fact, until relatively recent in history, most western people only had one name. Only when Napoleon decreed in 1811 (!) that everyone under his rule ought to have a last name did people take on "actual" last names. Before then, John was simply John. If you didn't know which John exactly, then "John son of the butchers by the church". Not every culture uses the same sort order for first name & last name. In Japan, the family name comes before the given name. If you want to use the first name to send informal communications along the lines of "Jack, we have a wonderful new offer", then you're out of luck in China. They use the full name for those types of things.
Just like with the addresses, the only sensible thing to do is stop trying to fit everyone in the same box, in a manner of speaking. Because what you actually need is *one* box for a name. Your users are perfectly well aware of their own name, so they can enter it in whatever order or format they're used to. If you really insist on being the cool kid by sending messages with personalized greetings, then I suggest you ask the user "how should we address you?". And for goodness' sake, DO NOT try and validate the length of a name. 2 characters is perfectly common in Asian languages, whereas the longest official name is -- according to Guinness World Records -- 799 characters long.
KISS is the golden rule, as always. If you don't need need the "standard" granularity for names and addresses, then don't bother with it. It's a waste of everyone's time and it will only frustrate users who don't fit in any of your boxes.
On naming Interfaces
Just about every programming book or OOP example I look at has the annoying habit of prefixing interface names with an I. Yes, IInterface, ICar, IWallet, IMoney, IIAmStupid. It's one of my pet peeves, I admit, but every time I come across code like this I find myself suppressing murderous tendencies.
Why oh why would you do this? "To clearly indicate that it's an interface". Why the hell do I need to know that it's an interface in the first place? If I want to use your library, I really don't give a toss whether I'm talking to an interface or a concrete class. If I do want to know (because I'm extending or initialising it) then I'm quite capable of figuring out for myself whether I'm dealing with an interface, abstract class or concrete class. The only thing this random I does is make it harder to look for suitable classes. I want to search for Car, or Wallet, or Money. Not to mention that I don't want to type that extra letter every single time I'm assigning an instance to a variable.
Seriously, how ugly is this
IFoo foo = IFooFactory.createIFoo();
The irony here is that I used "I" 32 times in this post. But for crying out loud, pick a meaningful name, not some silly prefixed monstrosity.
OutOfMemoryError while running Maven Surefire tests
Imagine you have a project which works perfectly fine and well. All tests pass, each and every time. Then one day you commit a couple of new classes with related tests. Of course you ran all tests before committing, and everything worked just fine. Then, a minute or so later, you get a mail from Hudson (or whatever you're using for CI) saying that there are test failures. "Maybe I forgot a file", I thought. Checked the test results on Hudson. About a dozen tests were failing, unrelated to anything I touched. Odd. OutOfMemoryErrors all over the place. Most odd. Hudson's tomcat has 1G, which should be plenty. Same with each build's MAVEN_OPTS.
Apparently, someone who wrote the Maven Surefire Plugin thought that it would be a GREAT idea to ignore things like MAVEN_OPTS and other memory settings. The plugin seems to start a new JVM instance to run the tests. Without any of the arguments you so carefully selected. No. Apparently you have to explicitly tell the Surefire plugin that maybe, just maybe, it would be a good idea to use the memory settings you already provided elsewhere.
Anyhoo, this fixed it:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.6</version>
<configuration>
<argLine>-Xmx512m</argLine>
</configuration>
</plugin>
DRY, you say? Not so much, eh.
Maven 3 resource filtering weirdness
Maven 3 is all nice and fast(er) and shiny, so I decided to upgrade a Maven 2 project to Maven 3. It (cl)aims to be backwards-compatible, so my consternation was pretty great when my build failed straight away. That's to say, my tests failed. For some reason, my resources were no longer being filtered. Yup, ${property.keys} weren't being replaced by values.
This struck me as being somewhat odd, because it worked fine with 2.2.1. A bit of debugging led me to the cause of the problem:
<!-- @Transactional can now be used as well -->
... apparently, the @ symbol is an escape character of sorts.
Considering that blurb on their website doesn't even qualify as English, I'm not sure if this is a feature or a bug. But whatever. Removing that comment fixed the problem. Whoever came up with that bright idea (especially in an age where @annotations are as rampant as the black plague in the 14th century) probably deserves a spanking.
The Real World sucks
Being of a reasonably calm disposition, I rarely reach boiling point, but there are a couple of things that really do grind my gears. Inefficiency is one of them, but let's not go there right now. What's really grinding my gears right now, is how much the so-called Real World (as opposed to the academic world) sucks.
Every once in a while, I will stumble upon a piece of code that I really wish wasn't there. It's the kind of code that screams "please refactor me!". It's the exploding septic tank of code smells. When I look at it, the words "well that's going to come back to bite us in the arse" pop into my head. In an ideal world, I'd be able to just sit down for a day and refactor the damned thing. This Real World thing, unfortunately, has something that's called management. Or was it manglement? They're part of Layer 8 either way. That's right, the political layer.
These politics seem to revolve solely around short term benefits. And so refactoring a piece of code that could cause problems eventually in such-and-such a situation isn't deemed important. This is fair enough in some cases, but sometimes you really do know that this decision will come back to haunt you. Sometimes I wonder whether managers really don't care about long-term implications, or whether they like having a couple of big bugs lurking in the code in the hope that some of these will be discovered much later so that we can get paid to fix them. Must be my cynical mind talking.
Is there a good way of dealing with this sort of nonsense? It's always possible to pull overtime and refactor the code then, but what if you introduce a bug? Good luck explaining a bug in new-and-improved code that you weren't supposed to write in the first place.
Yup, the Real World sucks. But if anyone has a way of making it better, I'd be glad to hear it!
