Archive for the ‘Optimization’ Category

SQL: Passing Array

Friday, January 4th, 2008

Many times you find yourself with this damned task, to pass an array to your SQL code (whether you supply the code, or whether it’s a stored procedure doesn’t really matter). You need to pass an array of elements, usually Id’s, but it’s absolutely up to you. In my app I had to pass to my stored procedure an array of Id’s. Each ID was a binary(20). Since I use MSSQL 2005, I could use XML to accomplish this task. So my stored procedure receieves a parameter of Xml:

   @ids XML
    SELECT f1, f2 FROM @ids.nodes(‘/id’) T(c) JOIN MyTable ON ID = T.c.value(‘.’, ‘MyIdType’);

Invoking it will be:

SET @x = “<id>aa</id><id>bb</id>”;
EXEC usp_MyProc @x

Now the Ids in @x must be encoded in Base64, because MyIdType is Binary(20). So everything is cool now, except one thing, performance sucks big time. Now I was wondering what is the difference between attributed values and the node itself:
<id value=”id goes here”/>
( SELECT f1, f2 FROM @ids.nodes(‘/id’) T(c) JOIN MyTable ON ID = T.c.value(‘value’, ‘MyIdType’); )
<id>id goes here</id>

Apparently when only doing a test of parsing the Xmls input, they are almost running at the same speed, although the attributed way is a bit faster. But when you take the result (that’s the converted id as binary) and insert it into a temp table, such as: DECLARE @tmp AS TABLE (ID MyIdType), the attributed Xml seems to perform 2X faster than the nodes format. Interesting ah?

Even so, the performance was still bad, and I had to do something better, I decided to get rid off the Xml and Base64 stuff, they take too much time for SQL to evaluate. My first attempt was to convert from an input of VARCHAR directly to Binary, but without success, the conversion won’t be done well, I got my binary stream as 0x4142 if I were to supply an input of “ab” instead of a right value of 0xab…Then I looked into this substring builtin function and found out (thanks to a friend) that it can operate on varbinary types as well! Imagine that. This meant that from now on I can supply the ids array as one long binary input as varbinary…

The code is now:

SET @P = 1;
SET @L = LEN(@ids)+1;
  SELECT f1, f2 FROM MyTable WHERE ID = SUBSTRING(@ids, @P, 20);
  SET @P = @P + 20;

Voila. This seemed to be twice faster than attributed Xml, thus we saved string parsing and Base64 decoding and we even eliminated the use of JOIN. Yet, we do the parsing of the binary stream ourself, and it a bit blows the code with more lines, but it’s still simple code and the performance is much better.

In the beginning I didn’t realize why the above code was faster than a single select query with Xml, but I then understood that SQL scans the tables using indexing, of course. So even if it parses the Xml each time for the next node, the select query happens in reality more than a single time (as much as there is Id nodes to read in), so breaking the single select and putting it in the while statement didn’t degrade performance, but only boosted it with binary parsing. However, binary parsing is not the right phrase to use, because we only “slice” (as in Python :) ) the binary input stream to smaller chunks which doesn’t do any parsing anyways. The only disadvantage I found to this way of passing arrays, was that I cannot pass records/structures as in Xml… It might require more work then and ugly parsing for integers etc…

Pop Count

Tuesday, June 5th, 2007

Population Count is counting the number of bits set to one in a word (or any sized integer). I found pop counting as one of the most fascinating small algorithms in the bit twiddling category. There are so many uses for pop counting. Though, rarely, I find myself use them, to make stuff compact, like bit-arrays, special bit-flags which represent collections, etc… There are also popular algorithms which use pop counting in error-correcting codes and sound/image processing.

Here’s my favorite implementation:

int popcount(unsigned int x)
int c = 0;
for (; x > 0; x &= x -1) c++;
return c;

The whole concept is behind the mere formula: x &= x -1. But I will leave it for you to figure it out.

Speaking of pop count, Imri and I even used this same implementation above in the NULL Terminated Strip. I even hinted for this algorithm. And here’s a real 32 bits bugfree implementation (that strip has the bugs on purpose):


Of course, most implementation are branch-free code, for the sake of speed. However – Lately, I added the SSE4 support to diStorm, and among the new instructions I found out there is a single instruction for it all: popcnt. :)