Non-zero cost abstractions.

Non-zero cost abstractions.

Or a story of a bad surprise.

Being abstract is something profoundly different from being vague … The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

  • Edsger Dijkstra

Let's take a look at the snippet of code performing exponentiation:

#include <cmath>
int my_pow(int x) {
    return std::pow(x, 6);
}

It would be reasonable to expect pow(x, 6) to be expanded into something like x * x * x * x * x * x or better yet:

int my_pow(int x) {
    int x2 = x * x;
    int x4 = x2 * x2;
    return x2 * x4;
}

which reduces the number of multiplications from 5 to 3.

Unfortunately, the generated assembly comes with a couple of bad surprises:

my_pow(int):
        scvtf   d0, w0
        stp     x29, x30, [sp, -16]!
        fmov    d1, 6.0e+0
        mov     x29, sp
        bl      pow
        fcvtzs  w0, d0
        ldp     x29, x30, [sp], 16
        ret
  • x has to be converted from integer to floating-point
  • literal 6 is also stored as a floating-point number in d1
  • less efficient floating-point version of pow is invoked, that comes with extra cost of function call overhead
  • the floating-point result of pow computation stored in d0 has to be converted into return integer.

Let's compare this assembly with the one generated for the naive hand-written version:

int my_pow(int x) {
    return x * x * x * x * x * x;
}

Lo and behold we get the optimal assembly:

my_pow(int):
        mul     w1, w0, w0 ; x2 = x * x
        mul     w0, w1, w0 ; x3 = x2 * x
        mul     w0, w0, w0 ; x6 = x3 * x3
        ret

that is a little different from the version we've speculated above but still uses the optimal number of multiplications - 3. This violates the principle of least surprise and one of the aspects of zero cost abstractions promised by C++:

Optimal performance: A zero cost abstractoin ought to compile to the best implementation of the solution that someone would have written with the lower level primitives. It can’t introduce additional costs that could be avoided without the abstraction.

Rust is another popular programming language promising zero cost abstractions, so let's see how it would handle

pub fn my_pow(x: i32) -> i32 {
    x.pow(6)
}

And we've got a winner:

my_pow:                                 # @my_pow
# %bb.0:
    imull    %edi, %edi ; x2 = x * x
    movl    %edi, %eax
    imull    %edi, %eax ; x4 = x2 * x2
    imull    %edi, %eax ; x6 = x2 * x4
    retq

As such, Rust was able to inline pow invocation and generate almost optimal code that interestingly matches the approach we've discussed at the beginning of the article. It comes with an extra movl instruction that could be eliminated, but it's still a lot more efficient than the C++ version.

Moral of the story? Don't violate the principle of least surprise and either fulfill the promise of zero-abstraction or be explicit about the shenanigans that happen under the hood.