Performance advice nugget 01: use flat representations

This is my last week on my current job, and I was reflecting about some of the things that I've learned in the past 7 years there. There's a lot of course! As a staff data engineer, I worked with many teams and codebases and learned a couple of tricks at scale. So I'm starting a new blog series "performance advice nugget" (or PAN for short), in which I'll share some insights of what worked quite well in practice.

So welcome to PAN 01: use flat representations.

I'll try to make these posts quite short. As usual with everything related to "performance", you should always measure and benchmark with your real workload, and always balance whether additional complexity is worth the performance gains.

Nice, forewords are said and out of the way. Let's focus on the matter: imagine that you are handling a data that has multiple levels, for example, a paragraph, that is made of sentences, each made of words, each made of characters:

a paragraph, made of setences, words and characters

One straight way to model this is with lists of lists:

class Paragraph:
    sentences: list[Sentence]


class Sentence:
    words: list[Word]


class Word:
    chars: list[str]


data = Paragraph([
    Sentence([
        Word(["H", "e", "l", "l", "o"]),
        Word(["W", "o", "r", "l", "d"]),
    ]),
    Sentence([
        Word(["B", "y", "e"]),
    ]),
])

Note: don't overthink Word in this synthetic example! In a more realistic setup, Word could probably just be a str. But here I'm using it to illustrate the "layered" modeling.

Today's PAN is: don't nest the lists! Instead, keep the leaf information (here the chars) flat in a single list. Then use other lists to index into the flat representation:

class FlatParagrah:
    characters: list[str]
    words: list[WordInfo]
    sentences: list[SentenceInfo]


class WordInfo:
    num_chars: int


class SentenceInfo:
    num_chars: int
    num_words: int


data = FlatParagrah(
    characters=["H", "e", "l", "l", "o", "W", "o", "r", "l", "d", "B", "y", "e"],
    words=[
        WordInfo(5),
        WordInfo(5),
        WordInfo(3),
    ],
    sentences=[
        SentenceInfo(num_chars=10, num_words=2),
        SentenceInfo(num_chars=3, num_words=1),
    ],
)

Why? Two main reasons:

CPU caching: CPUs are more efficient when accessing data that is packed closer together
less allocations: each list has to be allocated and deallocated individually. The flat design has a fixed and small number of lists

Is it true?

Well... as it's usual with data structures, it will highly depend on what you actually do with the data! In our case, it worked wonders.

I've prepared a micro benchmark in Rust to illustrate, you can check the source code for Paragraph and FlatParagraph.

Operation	Paragraph	FlatParagraph	Gain
Create	2100 ns	370 ns	5.7 x
Deallocate	460 ns	39 ns	12 x
Iterate over words	91 ns	100 ns	0.91 x
Iterate over chars	110 ns	32 ns	3.4 x
Create with inserted word	2400 ns	350 ns	6.9 x

The last operation is the one that our codebase did the most: create a new Paragraph from an existing one by inserting a new word in the middle of it. Something like:

class Paragraph:
    def with_inserted_word(
            self,
            paragraph_i: int,
            word_i: int,
            word: list[int],
    ) -> 'Paragraph':
        pass

And we've used a lot Rust's type system to provide an ergonomic and safe API for the rest of the codebase to handle the data, allowing it to cut and slice the "paragraph" however necessary.

Performance advice nugget 01: use flat representations

Is it true?

Comments

Add comment

Performance advice nugget 01: use flat representations

Is it true?

Be notified about new posts

Comments

Add comment