Trading memory for fewer allocations.

Even though memory allocations are not always easy to spot, they are fairly expensive due to overhead and garbage collector load. In a seemingly innocent function print_int

func print_int(x int) {
    fmt.Println(x)
}

compiler claims (-gcflags="-m")

x escapes to heap

so runtime.convT64 function is used to convert x into a pointer

00007 (6) CALL runtime.convT64(SB)

which is implemented in runtime.iface.go:

func convT64(val uint64) (x unsafe.Pointer) {
    if val < uint64(len(staticuint64s)) {
        x = unsafe.Pointer(&staticuint64s[val])
    } else {
        x = mallocgc(8, uint64Type, false)
        *(*uint64)(x) = val
    }
    return
}

The most interesting bit is x = unsafe.Pointer(&staticuint64s[val]) part, which returns a pointer from a preallocated staticuint64s pool of ints between 0 and 255:

// staticuint64s is used to avoid allocating in convTx for small integer values.
var staticuint64s = [...]uint64{
    0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
    0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
    ...
    0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
}

It's a fairly cheap way to trade a little memory for allocation reduction and is used in many other managed languages like Java. To make it even more useful, runtime reuses the same cache also for convT16 and convT32 functions.

It's such a useful technique that it's also used for dynamic caches, also known as object pools, but the extra flexibility of not being limited by a fairly small range of values comes at a cost of synchronization, so it's important to measure this overhead when evaluating object pools.

In summary, consider using static preallocated caches to reduce allocation count and improve performance.