Saving space with dictionary arrays.
Or replacing expensive types with cheap ones.
Even though the first resource that comes to mind when dealing with performance is CPU, memory is a common reason for CPU not being able to reach its full potential. Since many workloads involve processing and/or storing strings, many interesting data structures and algorithms have been developed to deal with different use-cases.
One of such data structures is a DictionaryArray
. There are many ways to implement it, but one of the simplest ones and the one that supports online processing in Go can look like:
type ValueT = int8
type DictionaryArray struct {
index map[string]ValueT
reverse_index map[ValueT]string
values []ValueT
}
func New() DictionaryArray {
return DictionaryArray{index: map[string]ValueT{}, reverse_index: map[ValueT]string{}}
}
func (da *DictionaryArray) Append(value string) {
idx, ok := da.index[value]
if !ok {
idx = ValueT(len(da.index))
da.index[value] = idx
da.reverse_index[idx] = value
}
da.values = append(da.values, idx)
}
func (da *DictionaryArray) Get(idx int) string {
// TODO: handle lookup error
return da.reverse_index[da.values[idx]]
}
ValueT
depends on the number of unique string values we expect - low-cardinality value sets are compressed best, e.g. state or country names, etc.
There are also additional ways to reduce memory usage by getting rid of maps and replacing them with a slice of keys that can additionally be sorted to enable binary search-based lookups.
But even current implementation delivers significant memory reduction. For benchmarking we'll use the names of states conveniently provided by gofakeit
package:
import (
"fmt"
"runtime"
"testing"
gofakeit "github.com/brianvoe/gofakeit/v6"
)
...
func BenchmarkAppend(b *testing.B) {
gofakeit.Seed(0)
const COUNT = 10000
b.ResetTimer()
for n := 0; n < b.N; n++ {
da := New()
for i := 0; i < COUNT; i++ {
da.Append(gofakeit.State())
}
}
b.StopTimer()
var stats runtime.MemStats
runtime.ReadMemStats(&stats)
fmt.Printf("TotalAlloc: %d\n", stats.TotalAlloc)
}
func BenchmarkSliceAppend(b *testing.B) {
gofakeit.Seed(0)
const COUNT = 10000
b.ResetTimer()
for n := 0; n < b.N; n++ {
var states []string
for i := 0; i < COUNT; i++ {
states = append(states, gofakeit.State())
}
}
b.StopTimer()
var stats runtime.MemStats
runtime.ReadMemStats(&stats)
fmt.Printf("TotalAlloc: %d\n", stats.TotalAlloc)
}
The numbers for DictionaryArray
look as follows:
goos: darwin
goarch: amd64
pkg: example.com/testing
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkAppend-16 TotalAlloc: 6229312
TotalAlloc: 49890072
806 1517089 ns/op 54166 B/op 31 allocs/op
and for a slice:
goos: darwin
goarch: amd64
pkg: example.com/testing
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkSliceAppend-16 TotalAlloc: 84189880
TotalAlloc: 849041688
926 1211290 ns/op 825971 B/op 20 allocs/op
so DictionaryArray
uses only ~5.9% of memory used by slice. Not bad for a naive implementation, which is why more efficient implementations are used in such performance-sensitive projects like Apache Arrow, which we might explore further in the following posts.