the packrat promise
so packrat parsing has this beautiful theoretical guarantee: linear time parsing for any PEG grammar. the idea is simple — memoize the result of every rule at every input position. if you've already tried to parse expression starting at position 47, and it succeeded consuming 12 characters, just return that cached result instead of re-parsing. no redundant work, ever.
the problem is the memory cost. if your grammar has r rules and your input has n positions, you need an O(n×r) memo table. for Pegasus parsing a real JavaScript file — say 190 rules against a 50KB source file — that's roughly 190 × 50,000 entries. most of those entries will never be looked up twice. you're paying for a massive table where 95% of the slots are used exactly once.
classic packrat parsers just eat that cost. they allocate the full table upfront and move on. but when i was profiling Pegasus on real-world JS bundles, the memo table was dominating memory usage while providing almost no benefit for most rules. the parser was spending more time on cache management than it was saving from cache hits.
selective memoization
the fix is to not memoize everything. instead, only memoize rules that are actually called multiple times at the same position. this sounds obvious in retrospect but figuring out which rules to memoize is the tricky part.
in Pegasus i take a two-pass approach. the first pass is a static analysis of the grammar: walk the rule graph and identify rules that appear in multiple alternatives or are referenced from rules that themselves get backtracked into. these are the rules where memoization actually pays off — they're the ones that will be re-invoked at the same position after a failed alternative.
for the JavaScript grammar, this cuts the memoized rule set from 190 down to about 30. that's an 84% reduction in memo table entries right there.
but even with fewer rules, you still need an efficient way to store and look up cached results. a flat array indexed by position works for full packrat, but with selective memoization the entries are sparse — you might have cached results at positions 47, 203, and 8891 for a given rule, with nothing in between. a flat array wastes space on all those empty slots.
avl trees for interval lookup
this is where the AVL tree comes in. each memoized rule gets an AVL-balanced binary search tree keyed by input position. looking up whether we have a cached result for rule X at position Y is O(log n) where n is the number of cached entries for that rule — not the input size.
Memo_Entry :: struct {
rule_id: int,
pos: int,
end_pos: int,
result: Parse_Result,
}
Memo_Cache :: struct {
tree: AVL_Tree(Memo_Entry),
}
memo_lookup :: proc(cache: ^Memo_Cache, rule_id: int, pos: int) -> (Parse_Result, bool) {
key := Memo_Entry{rule_id = rule_id, pos = pos}
if entry, ok := avl_find(&cache.tree, key); ok {
return entry.result, true
}
return {}, false
}the Memo_Entry stores the rule id, the start position, the end position (how far the rule consumed), and the parse result. the AVL tree is ordered by (rule_id, pos) so lookups are a straightforward tree search. insertions keep the tree balanced with rotations — standard AVL stuff.
the interval aspect matters for a subtle reason. when the parser backtracks, it doesn't just retry at the exact same position — sometimes it retries at a position within a previously matched range. the end_pos field lets us answer "did any cached rule span across this position?" which is useful for invalidation when i eventually add incremental parsing.
the tradeoff in practice
i benchmarked selective memoization against full packrat and no memoization on a 15KB JavaScript file:
- no memoization: 4.8ms, 0.2MB overhead
- full packrat: 2.1ms, 8.4MB overhead
- selective (AVL): 2.4ms, 0.9MB overhead
selective memoization is only ~14% slower than full packrat but uses 89% less memory. for a parser that might be running in a language server processing files on every keystroke, that memory difference matters a lot more than 300 microseconds.
the AVL tree adds a constant factor to each lookup compared to the flat array's O(1) access. but the tree only contains entries that are actually useful, so the total work across a full parse is less. you're doing more work per lookup but far fewer lookups overall.
i think the general principle here is worth remembering: the theoretically optimal approach (memoize everything, O(1) lookup) isn't always practically optimal. when your access pattern is sparse, a data structure that's slightly slower per-operation but dramatically smaller can win on total wall-clock time just by being nicer to the cache hierarchy. the CPU doesn't care about your big-O if your table doesn't fit in L2.