Bit-testing.

Bit-testing.

Or squeezing every bit of performance.

There are many reasons leverage integers instead of more expensive data structures, like strings. On top of being much more memory efficient without readability sacrifice, compilers can apply numerous optimizations to improve use-cases in which they are used. Bit-testing is one of such optimizations, that uses masks to perform multiple checks with a single comparison, somewhat similar to SIMD instructions. For example, a series of conditional expressions that compare the same variable

int isHtmlWhitespace(int ch) {
    return ch == 0x0009 || ch == 0x000A ||
            ch == 0x000C || ch == 0x000D ||
            ch == 0x0020;
}

can be transformed into a switch statement if each of them contains a comparison expression. This statement can be transformed into a switch statement and then expanded into a bit-test. Below is the assembly that GCC with enabled optimizations is able to produce

isHtmlWhitespace(int):
        cmp     edi, 32 ; compare ch with 0x0020
        ja      .L42        ; and return false if ch is larger, since 32 is the largest possible match
        movabs  rax, 4294981120
        mov     ecx, edi
        shr     rax, cl
        and     eax, 1
        ret
.L42:
        xor     eax, eax
        ret

Here is what happens:

  • cmp edi, 32 and ja .L42 handle the case when ch is outside of the range of the values we are interested in. 32 is the largest value that can be a match, so if ch is greater, we can safely return false;
  • then we use a mask 0b100000000000000000011011000000000 where 1 is set at positions that correspond to the values we are looking for shifted by one to the left. So 0x0009 is responsible for 10th set bit, 0x000A for the 11th bit and so on;
  • shr rax, cl shifts the mask value to the right by ch bits, so for 0x0009 0b100000000000000000011011000000000 becomes 0b100000000000000000011011 and for 0x000A - 0b10000000000000000001101;
  • and finally and eax, 1 checks if the rightmost bit after above shift is set, in which case ch is in the bitset and we can return true, otherwise, say, for ch == 0x000B, the shifted mask becomes 0b1000000000000000000110 with the rightmost bit set to 0 and function returns false.

Clang generates different but still very compact and interesting assembly

isHtmlWhitespace(int):                  # @isHtmlWhitespace(int)
        lea     eax, [rdi - 9]
        cmp     eax, 5
        jae     .LBB2_3
        mov     ecx, 27
        bt      ecx, eax
        jb      .LBB2_2
.LBB2_3:
        xor     eax, eax
        cmp     edi, 32
        sete    al
        ret
.LBB2_2:
        mov     eax, 1
        ret
  • lea eax, [rdi - 9] is doing something like new_ch = ch - 9 to shift the range of possible ch values from [9, 32] to [0, 23];
  • cmp eax, 5 and jae .LBB2_3 checks if new_ch can include 0x0009, 0x000A, 0x000C, 0x000D in the new [0,23] interval and if that's the case, we have to check the last potential match, 32 using cmp edi, 32;
  • mov ecx, 27 creates a bitset mask 0b11011 that has 1 at each position corresponding to new_ch values we are interested in - 0, 1, 3 and 4. bt ecx, eax uses a dedicated bit test instruction bt that performs the actual check and returns true in such case.

Unfortunately not all compilers are using this clever optimization. Unfortunately Go's compiler is one of such compilers and

func is_html_whitespace(ch int) bool {
    return ch == 0x0009 || ch == 0x000A ||
        ch == 0x000C || ch == 0x000D ||
        ch == 0x0020
}

is translated into a set of ifs

v14
    00003 (+4) MOVQ "".ch(SP), AX
v30
    00004 (4) CMPQ AX, $9
b1
    00005 (4) JNE 9
v29
    00006 (4) MOVL $1, AX
v28
    00007 (+4) MOVB AX, "".~r1+8(SP)
b8
    00008 (4) RET
v10
    00009 (4) CMPQ AX, $10
b2
    00010 (4) JEQ 6
v34
    00011 (+5) CMPQ AX, $12
b4
    00012 (5) JEQ 6
v31
    00013 (5) CMPQ AX, $13
b6
    00014 (5) JEQ 6
v7
    00015 (+6) CMPQ AX, $32
v24
    00016 (6) SETEQ AX
b9
    00017 (6) JMP 7

which is very unfortunate and serves as a reminder that HPC is not Go's area of focus.

And what about Rust? Interestingly its assembly is different from the one generated by Clang compiler for C++ code and impressively contains no branches

is_html_whitespace:                     # @is_html_whitespace
# %bb.0:
                                        # kill: def $edi killed $edi def $rdi
    leal    -9(%rdi), %eax
    cmpl    $2, %eax
    setb    %al
    movl    %edi, %ecx
    andl    $-2, %ecx
    cmpl    $12, %ecx
    sete    %cl
    orb    %al, %cl
    cmpl    $32, %edi
    sete    %al
    orb    %cl, %al
    retq

where

  • leal -9(%rdi), %eax is similar to Clang's new_ch = ch - 9;
  • cmpl $2, %eax and setb %al sets al in case ch matches 0x0009 or 0x000A;
  • andl $-2, %ecx and cmpl $12, %ecx handle 0x000C and 0x000D cases. andl $-2, %ecx unsets the rightmost bit so that 12 and 13 could be checked with a single comparison cmpl $12, %ecx;
  • cmpl $32, %edi handles ch == 0x0020;
  • and finally orb %cl, %al sets al to true in case we had at least one match.