Saving space with dictionary arrays.

Even though the first resource that comes to mind when dealing with performance is CPU, memory is a common reason for CPU not being able to reach its full potential. Since many workloads involve processing and/or storing strings, many interesting data structures and algorithms have been developed to deal with different use-cases. One of such data structures is a DictionaryArray. There are many ways to implement it, but one of the simplest ones and the one that supports online processing in Go can look like:

type ValueT = int8

type DictionaryArray struct {
    index         map[string]ValueT
    reverse_index map[ValueT]string
    values        []ValueT
}

func New() DictionaryArray {
    return DictionaryArray{index: map[string]ValueT{}, reverse_index: map[ValueT]string{}}
}

func (da *DictionaryArray) Append(value string) {
    idx, ok := da.index[value]
    if !ok {
        idx = ValueT(len(da.index))
        da.index[value] = idx
        da.reverse_index[idx] = value
    }
    da.values = append(da.values, idx)
}

func (da *DictionaryArray) Get(idx int) string {
    // TODO: handle lookup error
    return da.reverse_index[da.values[idx]]
}

ValueT depends on the number of unique string values we expect - low-cardinality value sets are compressed best, e.g. state or country names, etc. There are also additional ways to reduce memory usage by getting rid of maps and replacing them with a slice of keys that can additionally be sorted to enable binary search-based lookups. But even current implementation delivers significant memory reduction. For benchmarking we'll use the names of states conveniently provided by gofakeit package:

import (
    "fmt"
    "runtime"
    "testing"

    gofakeit "github.com/brianvoe/gofakeit/v6"
)
...
func BenchmarkAppend(b *testing.B) {
    gofakeit.Seed(0)
    const COUNT = 10000
    b.ResetTimer()
    for n := 0; n < b.N; n++ {
        da := New()
        for i := 0; i < COUNT; i++ {
            da.Append(gofakeit.State())
        }
    }
    b.StopTimer()
    var stats runtime.MemStats
    runtime.ReadMemStats(&stats)
    fmt.Printf("TotalAlloc: %d\n", stats.TotalAlloc)
}

func BenchmarkSliceAppend(b *testing.B) {
    gofakeit.Seed(0)
    const COUNT = 10000
    b.ResetTimer()
    for n := 0; n < b.N; n++ {
        var states []string
        for i := 0; i < COUNT; i++ {
            states = append(states, gofakeit.State())
        }
    }
    b.StopTimer()
    var stats runtime.MemStats
    runtime.ReadMemStats(&stats)
    fmt.Printf("TotalAlloc: %d\n", stats.TotalAlloc)
}

The numbers for DictionaryArray look as follows:

goos: darwin
goarch: amd64
pkg: example.com/testing
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkAppend-16        TotalAlloc: 6229312
TotalAlloc: 49890072
     806       1517089 ns/op       54166 B/op          31 allocs/op

and for a slice:

goos: darwin
goarch: amd64
pkg: example.com/testing
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkSliceAppend-16        TotalAlloc: 84189880
TotalAlloc: 849041688
     926       1211290 ns/op      825971 B/op          20 allocs/op

so DictionaryArray uses only ~5.9% of memory used by slice. Not bad for a naive implementation, which is why more efficient implementations are used in such performance-sensitive projects like Apache Arrow, which we might explore further in the following posts.

Saving space with dictionary arrays.

Or replacing expensive types with cheap ones.