We still know our code better than compilers.
Or a case of unnecessary CPU lock instruction
Compilers brought a huge productivity and performance boost thanks to their ability to translate high-level abstractions into highly-optimized low-level instructions. In fact they are so good at optimizing our code, that we just expect them to understand our code even better than us and for the most part it's a reasonable expectation. At the same time it's important to remember that we can still reason better about our code and when we expect a particular optimization, we should always verify it by checking out the generated assembly. Take following code snippet as an example:
#include <atomic>
using namespace std;
int trivial_inc() {
atomic<int> num;
return num.fetch_add(1);
}
Even though num
is an atomic int, we can easily convince ourselves that since num
is a local variable that does not escape trivial_inc
function and is initialized to 0
, we'd expect compiler to turn this code into something like
int trivial_inc() {
int num = 0;
return num + 1;
}
which can be further simplified to
int trivial_inc() {
return 1;
}
But here is what clang with -O3
is producing:
trivial_inc(): # @trivial_inc()
mov eax, 1
lock xadd dword ptr [rsp - 8], eax
ret
While it's able to remove most atomic<int>
traces, notice that it's still updating its value using unnecessary lock xadd
instruction.
Surely Rust with all its superior zero-cost abstractions will not repeat the same mistake, or would it? Let's check
use std::sync::atomic::{AtomicUsize, Ordering};
fn trivial_atomic() -> usize {
let count: AtomicUsize = AtomicUsize::new(0);
count.fetch_add(1, Ordering::SeqCst)
}
Oh no, we an see the same
movl $1, %eax
lock xaddq %rax, 24(%rsp)
Oh well, it's nice to know that we should expect more wins in the future from our compilers. To wrap up, I couldn't check what Go would do in this case
package main
import "sync/atomic"
func trivial_inc() uint64 {
var num uint64 = 0
atomic.AddUint64(&num, 1)
return num
}
And oh no
v12 00003 (+6) LEAQ type.uint64(SB), AX
v14 00004 (6) MOVQ AX, (SP)
v7 00005 (6) PCDATA $1, $0
v7 00006 (6) CALL runtime.newobject(SB)
v9 00007 (6) MOVQ 8(SP), AX
v18 00008 (+7) MOVL $1, CX
v10 00009 (7) LOCK
v10 00010 (7) XADDQ CX, (AX)
v15 00011 (+8) MOVQ (AX), AX
v17 00012 (8) MOVQ AX, "".~r0(SP)
b1 00013 (8) RET
00014 (?) END
So in addition to using unnecessary LOCK
, Go's compiler also allocates num
on the heap (CALL runtime.newobject(SB)
) :(
Well, nothing is perfect and I still admire compilers, but for things that matter, we should always look under the hood to see if there are no surprises.