Nim's Cynical Pleasantries Code, swords and a dirty mind!

17Aug/140

Dear PHP Developers, please stop using the closing tag ?>

Dear PHP Developers, please stop using ?> at the end of your scripts. There's no need for it, it's ugly, and it's a nightmare when you've got whitespace hiding behind the tag.

So stop, please. Just, stop.

Tagged as: , No Comments
20Jul/142

Laptops

My Thinkpad x230 gave me a bit of a scare recently. The display died out of the blue, and Lenovo's support was an absolute nightmare. In spite of next business day on site warranty, it took them nearly two weeks to actually fix the bloody thing. The first time the tech showed up, he replaced the broken panel with another broken panel. Not too useful, that.

While I was in limbo, unsure if it would ever get fixed under warranty, and wondering how much it would cost me to have it fixed, I spent some time researching alternatives.

Sadly, I found none. It seems like literally no one makes laptops for people who use use them anymore. There's loads of consumer-grade crap out there, but nothing useful. Especially in the 12.5" range.

There are a couple of recurring failures.

Keyboard & Mouse

x240

The Thinkpad x240 -- "new and improved" -- comes with a trackpad that takes up more space than the fucking keyboard. The top buttons (used in conjunction with the nipple) are part of the trackpad. Apparently if you disable the trackpad, you're out of luck and out of buttons. Great. Turning off the trackpad, of course, is mandatory, given that the thing is so fucking huge that my thumbs constantly touch it inadvertently. Especially annoying when focus follows pointer.

The picture doesn't do it justice, but another problem with the x240 is that the function keys got smaller and are now closer to the number row than before. Makes for an overall less enjoyable experience.

207647

dellXPS13-9333_3

These ASUS & Dell pieces of shite both seem to be missing a bunch of keys (page up, page down, home, insert, end, ...) and seem to have completely flat keys. Apparently there's a market for people who don't type.

Mechanical failures waiting to happen

xps

asus

Whoever designed these amazing ultra portable laptops seems to have forgotten that people will be moving around with them. Those display look like they'll snap in half the first time you bump into anything. And seriously, who the fuck needs a detachable display? Who buys this? What do they do for a living? Not software engineers, that's for sure.

Touchscreen

Apparently everybody and their grandmother wants a touchscreen. I can't for the life of my imagine why. Having to use a mouse (due to terrible window management and non-keyboard-friendly websites) is already a waste of time. Having to touch my screen (and leaving greasy fingerprints everywhere) is even worse. Fuck that shit.

So ...

Laptops that don't suffer from any of the above flaws are few and far between, and usually have a CPU that's slower than my phone's, or less memory than my toaster. Anything that ships with an AMD CPU isn't even worthy of being a coaster, either. Makes you wonder how they're still relevant.

Apparently there are literally no laptops on the market for software engineers. There's all kinds of crap designed around this Windows 8 insanity with its useless tablet-like interface. I'm sure these machines are perfectly fine for consuming information like some kind of brain dead zombie, but there's no universe in which these machines are useful for creating or getting any work done.

Needless to say, I'm quite happy that Lenovo did repair my x230 in the end. Hopefully it'll outlive its warranty. With a bit of luck, the x250 or x260 won't suck quite as much...

Filed under: Uncategorized 2 Comments
2Jul/140

Fun with Gradle Plugins – Integration Tests

Currently in the process of migrating a 10000 line Ant build to Gradle. Not quite as fun as it sounds, but at least the Gradle build should be faster, more maintainable and hopefully free from cruft.

One of this build's peculiarities is that it executes Unit Tests and Integration Tests at different points in the build cycle. The Integration tests rely on nasty things like databases, IBM MQ and LDAP. Things that are difficult to mock out & slow your build down to a grind if you execute these tests too frequently. Some of them are really System tests, if you're being pedantic, but let's just ignore that. The point is, there's a category of tests that we don't want to see executed during the normal unit test execution.

Full code available on GitHub.

Selim Öber explains how to accomplish this on his blog. I wanted to see if I could turn it into a plugin instead, as it will have to be applied to many different projects (made more difficult by a heterogenous coding environment. Not everything is a Java project).

The Goal

The goal of this plugin is to be able to execute tests in src/integration/java by executing the integrationTest target. These tests will expressly not be executed during the normal build cycle. These are to be run manually (or by your CI engine or whatever) periodically, as opposed to continuously on each build.

buildSrc

The buildSrc folder is Gradle's magic box. It lives in your project root and contains -- as the name might imply -- some of your build's source code. Gradle is smart enough to pick up on the stuff there, and will automagically compile it when it changes.

I chose groovy for this plugin, so I'll just let Gradle know about that, and I'll politely ask it to make the Gradle API available for me to use.

buildSrc/build.gradle

apply plugin: 'groovy'
 
repositories {
	mavenCentral()
}
 
dependencies {
	compile gradleApi()
}

Next is the actual plugin.

buildSrc/src/main/groovy/org/lick/me/gradle/IntegrationTestPlugin.groovy

package org.lick.me.gradle
 
import org.gradle.api.Plugin
 
import org.gradle.api.Project
import org.gradle.api.tasks.testing.Test
 
/**
 * Applying this plugin will let gradle know that the project contains integration tests.
 * These can be executed at a later point in time than unit tests -- when a database etc 
 * become available. 
 */
class IntegrationTestPlugin implements Plugin<Project> {
	@Override
	void apply(final Project project) {
 
		project.sourceSets {
			integTest {
				java.srcDir project.file('src/integration/java')
				resources.srcDir project.file('src/integration/resources')
			}
		}
 
		project.dependencies {
			integTestCompile project.sourceSets.main.output
			integTestCompile project.configurations.testCompile
			integTestCompile project.sourceSets.test.output
			integTestRuntime project.configurations.testRuntime
		}
 
		project.task('integrationTest', type: Test, description: 'Runs the integration tests.', group: 'Verification') {
			testClassesDir = project.sourceSets.integTest.output.classesDir
			classpath = project.sourceSets.integTest.runtimeClasspath
		}
 
		project.task('allTests', dependsOn: [project.test, project.integrationTest], description: 'Runs all tests.', group: 'Verification') {
 
		}
	}
}

Now all we need is a bit of magic to let Gradle know that we want to be able apply this plugin in other projects.

buildSrc/src/main/resources/META-INF/gradle-plugins/integrationTests.properties

implementation-class=org.lick.me.gradle.IntegrationTestPlugin

Now we can add the plugin to your projects.

apply plugin: 'integrationTests'

Running gradle tasks will now result in two extra tasks in the Verification section.

tasks

Filed under: Uncategorized No Comments
28Aug/134

Yet Another Battery Widget (Awesome 3.5.1)

Yet another battery widget for Awesome. This one actually works (shock! horror!) on Awesome 3.5.1 on my Thinkpad x230. Your mileage may vary. Colours used are from the excellent Solarized colour scheme. Behold the mighty widget, in all its unobtrusive glory!

battery

The implementation is in two parts: a simple shell script to output the battery status, and a bit of rc.lua tweaks to display the widget. This is mostly the result of a bit of copy/pasting from different sources I forgot to bookmark. Oh well.

~/bin/battery.sh:

#!/bin/bash
 
healthy='#859900'
low='#b58900'
discharge='#dc322f'
 
capacity=`cat /sys/class/power_supply/BAT0/capacity`
if (($capacity <= 25));
then
        capacityColour=$low
else
        capacityColour=$healthy
fi
 
status=`cat /sys/class/power_supply/BAT0/status`
 
if [[ "$status" = "Discharging" ]]
then
        statusColour=$discharge
        status="▼"
else
        statusColour=$healthy
        status="▲"
fi
 
echo "<span color=\"$capacityColour\">$capacity%</span> <span color=\"$statusColour\">$status</span>"

Add the following snippets to /path/to/awesome/rc.lua. I'll attempt to indicate the approximate location at the top of each snippet.

Create the widget..and don't forget to adjust the path to the battery.sh script.

-- This goes below the line containing mytextclock = awful.widget.textclock()
 
-- Create a battery widget
battery = wibox.widget.textbox()
function getBatteryStatus()
   local fd= io.popen("/path/to/battery.sh")
   local status = fd:read()
   fd:close()
   return status
end

Add the widget..

-- This goes above the line containing right_layout:add(mytextclock)
    right_layout:add(battery)

Get the widget to refresh every 30 seconds. Put this somewhere near the end of the config file.

-- Battery status timer
batteryTimer = timer({timeout = 30})
batteryTimer:connect_signal("timeout", function()
  battery:set_markup(getBatteryStatus())
end)
batteryTimer:start()
battery:set_markup(getBatteryStatus())

That's all! Restart awesome and you'll see a relatively purdy yet unobstrusive battery status display.

Tagged as: 4 Comments
3Aug/139

Java Date Performance Subtleties

A recent profling session pointed out that some of our processing threads were blocking on java.util.Date construction. This is troubling, because it's something we do many thousands of times per second, and blocked threads are pretty bad!

A bit of digging led me to TimeZone.getDefault(). This, for some insanely fucked up reason, makes a synchronized call to TimeZone.getDefaultInAppContext(). The call is synchronized because it attempts to load the default time zone from the sun.awt.AppContext. What. The. Fuck. I don't know what people were smoking when they wrote this, but I hope they enjoyed it ...

Unfortunately, Date doesn't have a constructor which takes a TimeZone argument, so it always calls getDefault() instead.

I decided to run some microbenchmarks. I benchmarked four different ways of creating Dates:

// date-short:
    new Date();
//date-long: 
    new Date(year, month, date, hrs, min, sec);
// calendar:
    Calendar cal = Calendar.getInstance(TimeZone);
    cal.set(year, month, date, hourOfDay, minute, second)
    cal.getTime();
// cached-cleared-calendar:
//    Same as calendar, but with Calendar.getInstance() outside of the loop, 
//    and a cal.clear() call in the loop.

I tested single threaded performance, where 1M Dates were created using each method in a single thread. Then multi-threaded with 4 threads, each thread creating 250k Dates. In other words: both methods ended up creating the same number of Dates.

Lower is beter.

Click to enlarge. Lower is beter.

With exception of date-long, all methods speed up by a factor of 2 when multi-threaded. (The machine only has 2 physical cores). The date-long method actually slows down when multi-threaded. This is because of lock contention in the synchronized TimeZone acquisition.

The JavaDoc for Date suggests replacing the date-long call by a calendar call. Performance-wise, this is not a very good suggestion: its single-threaded performance is twice as bad as that of Date unless you reuse the same Calendar instance. Even multi-threaded it's outperformed by date-long. This is simply not acceptable.

Fortunately, the cached-cleared-calendar option performs very well. You could easily store a ThreadLocal reference to an instance of a Calendar and clear it whenever you need to use it.

More important than the raw duration of the Date creation, is the synchronization overhead. Every time a thread has to wait to enter a synchronized block, it could end up being rescheduled or swapped out. This reduces the predictability of performance. Keeping synchronization down to a minimum (or zero, in this case) increases predictability and liveness of the application in general.

Before anyone mentions it: yes, I'm aware that the long Date constructors are deprecated. Unfortunately, they are what Joda uses when converting to Java Dates. I've proposed a patch, but while doing a bit more research for this blog post, I've come to the conclusion that my patch needs a bit of refining as it is still too slow (though it no longer blocks). In the mean while, I hope that the -kind?- folks at Oracle will reconsider their shoddy implementation.

I've also heard rumours that Joda will somehow, magically, replace java.util.Date in JDK 8. Not sure how that's going to work with backwards compatibility. I'd be much happier if java.util.Date would stop sucking quite as much. And if SimpleDateFormat were made thread-safe. And ... the list goes on.

28Apr/130

Character sets, time zones and hashes

Character sets, time zones and password hashes are pretty much the bane of my life. Whenever something breaks in a particularly spectacular fashion, you can be sure that one of those three is, in some way, responsible. Apparently the average software developer Just Doesn't Get It™. Granted, they are pretty complex topics. I'm not expecting anyone to care about the difference between ISO-8859-15 and ISO-8859-1, know about UTC's subtleties or be able to implement SHA-1 using a ball of twine.

What I do expect, is for sensible folk to follow these very simple guidelines. They will make your (and everyone else's) life substantially easier.

Use UTF-8..

Always. No exceptions. Configure your text editors to default to UTF-8. Make sure everyone on your team does the same. And while you're at it, configure the editor to use UNIX-style line-endings (newline, without useless carriage returns).

..or not

Make sure you document the cases where you can't use UTF-8. Write down and remember which encoding you are using, and why. Remember that iconv is your friend.

Store dates with time zone information

Always. No exceptions. A date/time is entirely meaningless unless you know which time zone it's in. Store the time zone. If you're using some kind of retarded age-old RDBMS which doesn't support date/time fields with TZ data, then you can either store your dates as a string, or store the TZ in an extra column. I repeat: a date is meaningless without a time zone.

While I'm on the subject: store dates in a format described by ISO 8601, ending with a Z to designate UTC (Zulu). No fancy pansy nonsense with the first 3 letters of the English name of the month. All you need is ISO 8601.

Bonus tip: always store dates in UTC. Make the conversion to the user time zone only when presenting times to a user.

Don't rely on platform defaults

You want your code to be cross-platform, right? So don't rely on platform defaults. Be explicit about which time zone/encoding/language/.. you're using or expecting.

Use bcrypt

Don't try to roll your own password hashing mechanism. It'll suck and it'll be broken beyond repair. Instead, use bcrypt or PBKDF2. They're designed to be slow, which will make brute-force attacks less likely to be successful. Implementations are available for most sensible programming environments.

If you have some kind of roll-your-own fetish, then at least use an HMAC.

Problem be gone

Keeping these simple guidelines in mind will prevent entire ranges of bugs from being introduced into your code base. Total cost of implementation: zilch. Benefit: fewer headdesk incidents.

13Jan/137

Repeat after me: MySQL is not a filesystem

I came across this gem on DZone this morning. It's a tutorial on storing images in a MySQL database (using PHP). There are several things in the tutorial that I don't agree with, but I'll let those slide. What really bugs me, is how it fails to mention that this is a very bad idea.

A relational database is not a filesystem. Files go on a filesystem. Relational data goes in an RDBMS. Repeat that a couple of times.

The most compelling argument for this, is performance. I did a quick test. I did a google image search on stupidity and downloaded the first 10 images. I then wrote PHP scripts to serve them up in two ways:

1. From a MySQL (MyISAM) table with 2 columns: ID (int, auto_increment) and DATA (mediumblob)
2. Using readfile.

The third test method, "FS", simply loads the image over HTTP directly, without any intermediary scripts.

The results are the average of running Apache Benchmark 10 times: 10 concurrent requests, 1000 requests per run.

images

As you can see, the MySQL approach is a hell of a lot slower than the more sensible FS approach.

The best way to store your images (or other binary files) is on the filesystem. Every modern web server does a good (or excellent) job of serving up static content. Storing them in a database is by far the worst possible solution. Not only because it's slow, but also because it complicates database backups: MySQL dumps with binary data don't compress very well, causing the whole database backup to be slower and larger than needs be.

So please, be sensible. Store your files on a filesystem.

9Jan/130

Java 7 Performance

I decided to compare Java 6 & 7 performance for $employer's $application. Java 7 performs better — as expected. What I did not expect, was that the difference would be so big. Around 10% on average. That's not bad for something as simple as a version bump.

Jave 6 vs Java 7

Ideally I'd like to investigate where this difference comes from. I suspect improved ergonomics have a lot to do with it.

$application uses Apache Solr rather extensively. In fact, most of the time querying is spent in Solr. With indexing it's probably about 50% of the time. With querying it's probably closer to 90%. All tests are run in a controlled environment, so I have a fair amount of confidence in these results.

The indexing test inserts 3 million documents in Solr. Creating these documents takes up the bulk of the time. It involves a lot of filesystem access -- something which Java versions have very little influence over and heavily multi-threaded CPU-intensive processing.

If you're not using Java 7, you really should consider upgrading. If you're stuck with people who live in the past, maybe you can convince them with a bunch of pretty performance graphs of your own.

31Dec/122

Gnuplot data analysis, real world example

Creating graphs in LibreOffice is a nightmare. They're ugly, nearly impossible to customize and creating pivot tables with data is bloody tedious work. In this post, I'll show you how I took the output of a couple of performance test scripts and turned it into reasonably pretty graphs with a few standard command line tools (gnuplot, awk, a bit of (ba)sh and a Makefile).

The Data

I ran a series of query performance tests against data sets of different sizes. The sets contain 10k, 100k, 1M, 10M, 100M and 500M documents. One of the basic constraints is that it has to be easy to add/remove sets. I don't want to faff about with deleting columns or updating pivot tables. If I add a set to my test data, I want it automagically show up in my graphs.

The output of the test script is a simple tab separated file, and looks like this:

#Set	Iteration	QueryID	Duration
500M	1	101	10.497499465942383
500M	1	102	3.9973576068878174
500M	1	103	9.4201889038085938
500M	1	104	2.8091645240783691
500M	1	105	2.944718599319458
500M	1	106	5.1576917171478271
500M	1	107	5.7224125862121582
500M	1	108	5.7259769439697266
500M	1	109	4.7974696159362793

Each row contains the query duration (in seconds) for a single execution of a single query.

Processing the data

I don't just want to graph random numbers. Instead, for each query in each set, I want the shortest execution time (MIN), the longest (MAX) and the average across iterations (AVG). So we'll create a little awk script to output data in this format. In order to make life easier for gnuplot later on, we'll create a file per dataset.

% head -n 3 output/500M.dat

#SET	QUERY	MIN	MAX	AVG	ITERATIONS
500M	200	0.071	2.699	0.952	3
500M	110	0.082	5.279	1.819	3

Here's the source of the awk script, transform.awk. The code is quite verbose, to make it a bit easier to understand.

BEGIN {
}
 
{
        if($0 ~ /^[^#]/) {
                key = $1"_"$3
                first = iterations[key] > 0 ? 0 : 1
                sets[$1] = 1
                queries[$3] = 1
                totals[key] += $4
                iterations[key] += 1
 
                if(1 == iterations[key]) {
                        minima[key] = $4
                        maxima[key] = $4
                } else {
                        minima[key] = $4 < minima[key] ? $4 : minima[key]
                        maxima[key] = $4 > maxima[key] ? $4 : maxima[key]
                }
        }
}
 
END {
 
        for(set in sets) {
                outfile = "output/"set".dat"
                print "#SET\tQUERY\tMIN\tMAX\tAVG\tITERATIONS" > outfile
                for(query in queries) {
                        key = set"_"query
                        iterationCount = iterations[key]
                        average = totals[key] / iterationCount
                        printf("%s\t%d\t%.3f\t%.3f\t%.3f\t%d\n", set, query, minima[key], maxima[key], average, iterationCount) >> outfile
 
                }
        }
}

This code will read our input data, calculate MIN, MAX, AVG, number of iterations for each query and dump the contents in a tab-separated dat file with the same name as the set. Again, this is done to make life easier for gnuplot later on.

I want to see the effect of dataset size on query performance, so I want to plot averages for each set. Gnuplot makes this nice and easy, all I have to do is name my sets and tell it where to find the data. But ah ... I don't want to tell gnuplot what my sets are, because they should be determined dynamically from the available data. Enter, a wee shellscript that outputs gnuplot commands.

#!/bin/sh
 
# Output plot commands for all data sets in the output dir
# Usage: ./plotgenerator.sh column-number
# Example for the AVG column: ./plotgenerator.sh 5
 
prefix=""
 
echo -n "plot "
for s in `ls output | sed 's/\.dat//'` ;
do
        echo -n "$prefix \"output/$s.dat\" using 2:$1 title \"$s\""
 
        if [[ "$prefix" == "" ]] ; then
                prefix=", "
        fi
done

This script will generate a gnuplot "plot" command. Each datafile gets its own title (this is why we named our data files after their dataset name) and its own colour in the graph. We want to plot two columns: the QueryID, and the AVG duration. In order to make it easier to plot the MIN or MAX columns, I'm parameterizing the second column: the $1 value is the number of the AVG, MIN or MAX column.

Plotting

Gnuplot will call the plotgenerator.sh script at runtime. All that's left to do is write a few lines of gnuplot!

Here's the source of average.gnp

#!/usr/bin/gnuplot
reset
set terminal png enhanced size 1280,768
 
set xlabel "Query"
set ylabel "Duration (seconds)"
set xrange [100:]
 
set title "Average query duration"
set key outside
set grid
 
set style data points
 
eval(system("./plotgenerator.sh 5"))

The result

% ./average.gnp > average.png

Click for full size.

average

Wrapping it up with a Makefile

I don't like having to remember which steps to execute in which order, and instead of faffing about with yet another shell script, I'll throw in another *nix favourite: a Makefile.

It looks like this:

average:
        rm -rf output
        mkdir output
        awk -f transform.awk queries.dat
        ./average.gnp > average.png

Now all you have to do, is run

make

whenever you've updated your data file, and you'll end up with a nice'n purdy new graph. Yay!

Having a bit of command line proficiency goes a long way. It's so much easier and faster to analyse, transform and plot data this way than it is using graphical "tools". Not to mention that you can easily integrate this with your build system...that way, each new build can ship with up-to-date performance graphs. Just sayin'!

Note: I'm aware that a lot of this scripting could be eliminated in gnuplot 4.6, but it doesn't ship with Fedora yet, and I couldn't be arsed building it.

31Dec/120

What bugs me on the web

2013 is nearly upon us, and the web has come a very long way in the ~15 years I've been a netizen. And yet, even though we've made so many advances, it sometimes feels like we've been stagnant, or worse, regressed in some cases.

Each and every web developer out there should have a long, hard think about how the web has (d)evolved in their lifetime and which way we want to head next. There's an awful lot happening at the moment: web 2.0, HTML 5, Flash's death-throes, super-mega-ultra tracking cookies, EU cookie regulation nonsense, microdata, cloud fun, ... I could go on all day. Needless to say: it's a mixed bunch.

In any event, here's a brief list of 3 things that bug me on the web.

Links are broken

Usability has long been the web's sore thumb, and in spite of any number of government-sponsored usability certification programmes over the year, people still don't seem to give a rat's arse. Websites are still riddled with nasty drop down menus that only work with a mouse. Sometimes they're extra nasty by virtue of being ajaxified. At least Flash menus are finally going the way of the dinosaur.

Pro tip: every single bloody link on your web site should have a working HREF, so people can use it without relying on click handlers, mice, javascript and so people can open the bloody thing in a new tab without going through hell and back.

Bonus points: make your links point to human-readable URLs.

Languages, you're doing it wrong

The web is no longer an English-only or US-only playing field, and companies all over are starting to cotton on to this fact. What they have yet to realise, however, is that people don't necessarily speak the language you think they do. If you rely on geolocation data to serve up translated content: stop. You're doing it wrong. The user determines the language. Believe it or not, people do know which language(s) they speak.

Geolocation, for starters, isn't an exact science. Depending on the kind of device this can indeed be very accurate. Or very much not. Proxies, VPNs, Onion Routers etc can obviously mislead your tracking. Geolocation tells you nothing. It doesn't tell you why that person is there (maybe they're on holiday?). It also doesn't tell you what language is spoken there. This might be a shock to some people, but some countries have more than one official language. Hell, some villages do. Maybe you can find this data somewhere, and correlate it with the location, but you'd be wrong to. Language is a very sensitive issue in some places. Get it right, or pick a sensible default and make clear that it was a guess. Don't be afraid to ask for user input.

Pro tip: My favourite HTTP header: Accept-Language. Every sensible browser sends this header with every request. In most cases, the default is the browser's or OS's language. Which is nearly always the user's first language, and when it's not, at least you know the user understands it well enough to be able to use a browser..

Bonus points: Seriously, use Accept-Language. If you don't, you're a dick.

Clutter is back

Remember how, back in 1999, we all thought Google looked awesome because it was so clean & crisp and didn't get in your face and everyone copied the trend? Well, that seems to have come to an end.
Here's Yahoo in 1997. (I love how it has an ad for 256mb of memory.)
Here's Yahoo now.

The 1997 version was annoying to use (remember screen resolutions in the 90s? No? You're too young to read this, go away) because it was so cluttered.
The 2012 version is worse and makes me want to gouge my eyes out.

Even Google is getting all in your face these days, with search-as-you-type and whatnot. Bah. DuckDuckGo seems to be the exception (at least as far as search engines go). It offers power without wagging it in your face.

Pro tip: don't put a bazillion things on your pages. Duh.

2013 Wishlist

My web-wishlist for 2013 is really quite simple: I want a usable web. Not just people with the latest and greatest javascript-enabled feast-your-eyes-on-this devices. For everyone. Including those who use text-to-speech, or the blind, or people on older devices. Graceful degradation is key to this. So please, when you come up with a grand feature, think about what we might be giving up on as well. Don't break links. Don't break the back button. Don't break the web.

Tagged as: , No Comments