Saving space with dictionary arrays.

Saving space with dictionary arrays.

Or replacing expensive types with cheap ones.

Even though the first resource that comes to mind when dealing with performance is CPU, memory is a common reason for CPU not being able to reach its full potential. Since many workloads involve processing and/or storing strings, many interesting data structures and algorithms have been developed to deal with different use-cases. One of such data structures is a DictionaryArray. There are many ways to implement it, but one of the simplest ones and the one that supports online processing in Go can look like:

type ValueT = int8

type DictionaryArray struct {
    index         map[string]ValueT
    reverse_index map[ValueT]string
    values        []ValueT
}

func New() DictionaryArray {
    return DictionaryArray{index: map[string]ValueT{}, reverse_index: map[ValueT]string{}}
}

func (da *DictionaryArray) Append(value string) {
    idx, ok := da.index[value]
    if !ok {
        idx = ValueT(len(da.index))
        da.index[value] = idx
        da.reverse_index[idx] = value
    }
    da.values = append(da.values, idx)
}

func (da *DictionaryArray) Get(idx int) string {
    // TODO: handle lookup error
    return da.reverse_index[da.values[idx]]
}

ValueT depends on the number of unique string values we expect - low-cardinality value sets are compressed best, e.g. state or country names, etc. There are also additional ways to reduce memory usage by getting rid of maps and replacing them with a slice of keys that can additionally be sorted to enable binary search-based lookups. But even current implementation delivers significant memory reduction. For benchmarking we'll use the names of states conveniently provided by gofakeit package:

import (
    "fmt"
    "runtime"
    "testing"

    gofakeit "github.com/brianvoe/gofakeit/v6"
)
...
func BenchmarkAppend(b *testing.B) {
    gofakeit.Seed(0)
    const COUNT = 10000
    b.ResetTimer()
    for n := 0; n < b.N; n++ {
        da := New()
        for i := 0; i < COUNT; i++ {
            da.Append(gofakeit.State())
        }
    }
    b.StopTimer()
    var stats runtime.MemStats
    runtime.ReadMemStats(&stats)
    fmt.Printf("TotalAlloc: %d\n", stats.TotalAlloc)
}

func BenchmarkSliceAppend(b *testing.B) {
    gofakeit.Seed(0)
    const COUNT = 10000
    b.ResetTimer()
    for n := 0; n < b.N; n++ {
        var states []string
        for i := 0; i < COUNT; i++ {
            states = append(states, gofakeit.State())
        }
    }
    b.StopTimer()
    var stats runtime.MemStats
    runtime.ReadMemStats(&stats)
    fmt.Printf("TotalAlloc: %d\n", stats.TotalAlloc)
}

The numbers for DictionaryArray look as follows:

goos: darwin
goarch: amd64
pkg: example.com/testing
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkAppend-16        TotalAlloc: 6229312
TotalAlloc: 49890072
     806       1517089 ns/op       54166 B/op          31 allocs/op

and for a slice:

goos: darwin
goarch: amd64
pkg: example.com/testing
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkSliceAppend-16        TotalAlloc: 84189880
TotalAlloc: 849041688
     926       1211290 ns/op      825971 B/op          20 allocs/op

so DictionaryArray uses only ~5.9% of memory used by slice. Not bad for a naive implementation, which is why more efficient implementations are used in such performance-sensitive projects like Apache Arrow, which we might explore further in the following posts.