Being abstract is something profoundly different from being vague … The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.
- Edsger Dijkstra
Let's take a look at the snippet of code performing exponentiation:
#include <cmath>
int my_pow(int x) {
return std::pow(x, 6);
}
It would be reasonable to expect pow(x, 6)
to be expanded into something like x * x * x * x * x * x
or better yet:
int my_pow(int x) {
int x2 = x * x;
int x4 = x2 * x2;
return x2 * x4;
}
which reduces the number of multiplications from 5
to 3
.
Unfortunately, the generated assembly comes with a couple of bad surprises:
my_pow(int):
scvtf d0, w0
stp x29, x30, [sp, -16]!
fmov d1, 6.0e+0
mov x29, sp
bl pow
fcvtzs w0, d0
ldp x29, x30, [sp], 16
ret
x
has to be converted from integer to floating-point- literal
6
is also stored as a floating-point number ind1
- less efficient floating-point version of
pow
is invoked, that comes with extra cost of function call overhead - the floating-point result of
pow
computation stored ind0
has to be converted into return integer.
Let's compare this assembly with the one generated for the naive hand-written version:
int my_pow(int x) {
return x * x * x * x * x * x;
}
Lo and behold we get the optimal assembly:
my_pow(int):
mul w1, w0, w0 ; x2 = x * x
mul w0, w1, w0 ; x3 = x2 * x
mul w0, w0, w0 ; x6 = x3 * x3
ret
that is a little different from the version we've speculated above but still uses the optimal number of multiplications - 3
.
This violates the principle of least surprise and one of the aspects of zero cost abstractions promised by C++:
Optimal performance: A zero cost abstractoin ought to compile to the best implementation of the solution that someone would have written with the lower level primitives. It can’t introduce additional costs that could be avoided without the abstraction.
Rust is another popular programming language promising zero cost abstractions, so let's see how it would handle
pub fn my_pow(x: i32) -> i32 {
x.pow(6)
}
And we've got a winner:
my_pow: # @my_pow
# %bb.0:
imull %edi, %edi ; x2 = x * x
movl %edi, %eax
imull %edi, %eax ; x4 = x2 * x2
imull %edi, %eax ; x6 = x2 * x4
retq
As such, Rust was able to inline pow
invocation and generate almost optimal code that interestingly matches the approach we've discussed at the beginning of the article. It comes with an extra movl
instruction that could be eliminated, but it's still a lot more efficient than the C++ version.
Moral of the story? Don't violate the principle of least surprise and either fulfill the promise of zero-abstraction or be explicit about the shenanigans that happen under the hood.