Archives for Java

Implementing HAT-Tries in Java

As I said in an earlier post detailing the CCHashTable, it formed a part of a larger data structure, the HAT-Trie. The HAT-Trie is a recent variant on the trie data structure, published by Askitis and Sinha in last year’s Australasian conference on Computer science (the original paper is here). The problem they were addressing was that while the trie is very fast, it’s also very expansive in memory; and while some variants of it like the radix trie (or Patricia trie) and the related burst-trie, are less expensive in terms of memory space, they aren’t as fast as they could be because they don’t take into account caches in their design.

Consider for a moment, the standard trie node, with its array of pointers to all possible following characters. Think of a string of these nodes, representing a character string of some type, and how the terminal node then stores a link to a piece of information which the overall character string is indexing to (there are obviously a lot of applications for such a data structure). The obvious problem is that the longer the string, the more nodes, and the nodes are large – even for standard ascii, that’s 128 pointers per node, which means that to store ten characters, you’re looking at 512 bytes. And few strings used in such applications are that short – certainly in the application I was looking at originally, we were looking at strings ten times that size at least.

The burst-trie, the radix trie (and it’s predecessor the compact trie) all attempt to solve this problem through similar methods involving compressing parts of the trie. The compact trie looks at all the branches of the trie that lead straight to a terminal node and represents it with a single … Read the rest

Cache conscious hash tables

So one of the things I was working on as part of DeviceAtlas (but which ultimately didn’t get used) was a cache-conscious hash table in Java. It’s not unique in design – in fact it comes right out of a research paper written for the SPIRE 2005 conference by Nikolas Askitis and Justin Zobel – but the implementation was interesting to me as I’d not done optimisation work in Java in a while, and some things had changed quite a bit since I last wrote Java code. And it was a bit of an ego boost that I got it to outperform java.util.HashMap:

SUN HashMap fill: 57797 us
SUN HashMap query: 165701 us, 0 errors
CCHashTable fill (fair): 23205 us
CCHashTable query (fair): 35513 us, 0 errors
CCHashTable fill: 41723 us
CCHashTable query: 43055 us, 0 errors

Of course, there are the minor criticisms that it’s nowhere near as general-purpose as the HashMap class and that HashMap is arguably exhibiting an intelligent design choice rather than cheating per se, but I like my ego so I’m going to ignore those arguments!

 … Read the rest

Sun buys MySQL

So Sun has splurged some $800 million in cash and taken up $200 million in options to purchase MySQL AB. It’s somewhat of an odd move really. I mean, Sun’s got a decent reputation for open source stuff (not always linux-friendly, but “open source” does not mean “linux” after all). Java is now GPL’d, openSolaris as well, there’s code finding its way from Sun to Linux at a kernel level, and there are other examples. But MySQL?

About the only thing that comes to mind right now is that we might see a push from Sun in the coming years away from the standard LAMP stack and towards OTMS/STMJ stacks (Opensolaris/Tomcat/MySQL/Servlet or Solaris/Tomcat/MySQL/JSP or some suitable combination) so we’d have a Sun-friendly stack looking for a piece of LAMP’s market share.

It hardly seems logical. PHP versus Servlet/JSP isn’t a sensible comparison. If you look at development time, PHP is so far ahead of JSP/Servlets that it’s not even a competition any more; and likewise if you look at runtimes, PHP is so far behind that it’s just not fair. There’s just no room for a particular stack to get itself chosen over the other if it’s not suitable. But then, that’s what a huge marketing department is for, right?

 … Read the rest