the infinite loop nobody warns you about

so here's a thing that tripped me up early when writing the ES2025 grammar for Pegasus. i had an expression rule that looked perfectly reasonable:

# left-recursive — looks fine, is not fine
Expr <- Expr '+' Term / Term

this is how you'd write it in a CFG (context-free grammar). it's how the ECMAScript spec writes it. it's how every textbook writes it. and in a PEG parser it's an infinite loop.

the reason is mechanical. PEG tries alternatives in order. so to match Expr, it first tries Expr '+' Term. to match that, it needs to match Expr. to match Expr, it first tries Expr '+' Term. to match that, it needs to match Expr. forever. the parser never advances a single character — it just recurses into itself until the stack overflows.

this isn't a bug. it's a fundamental property of how PEG ordered choice works. a CFG parser like an LR parser handles left recursion fine because it builds a parse table that knows when to reduce. but PEG is top-down, recursive descent at its core. left recursion means infinite descent.

the fix: make it iterative

the rewrite is straightforward once you see the pattern. instead of saying "an expression is an expression plus a term, or just a term," you say "an expression is a term followed by zero or more plus-term pairs":

# iterative — works in PEG
Expr <- Term ('+' Term)*

that * is the key. it means "repeat zero or more times" and it's implemented as a loop, not recursion. the parser matches one Term, then keeps trying to match '+' Term until it fails, then returns everything it collected. no self-reference, no infinite loop.

scaling this to a real language

for the ES2025 grammar this pattern had to be applied at every level of the expression precedence hierarchy. javascript has something like 15 precedence levels for expressions, and in the spec they're all left-recursive. assignment contains conditional contains logical OR contains logical AND, all the way down to primary expressions.

the iterative rewrite turns each level into the same shape:

# JS expression hierarchy (simplified)
Assignment     <- Conditional ('=' Assignment)?
Conditional    <- LogicalOr ('?' Assignment ':' Assignment)?
LogicalOr      <- LogicalAnd ('||' LogicalAnd)*
LogicalAnd     <- BitwiseOr ('&&' BitwiseOr)*
BitwiseOr      <- BitwiseXor ('|' BitwiseXor)*
BitwiseXor     <- BitwiseAnd ('^' BitwiseAnd)*
BitwiseAnd     <- Equality ('&' Equality)*
Equality       <- Relational (('==' / '!=') Relational)*
Relational     <- Shift (('<' / '>' / '<=' / '>=') Shift)*
Shift          <- Additive (('<<' / '>>' / '>>>') Additive)*
Additive       <- Multiplicative (('+' / '-') Multiplicative)*
Multiplicative <- Unary (('*' / '/' / '%') Unary)*

notice the two shapes. the binary operators like ||, &&, +, * use (op operand)* — they're left-associative, so iteration naturally gives you the right grouping. but assignment and conditional use ? (optional) instead of * because they're right-associative. a = b = c means a = (b = c), so the rule calls back up to itself on the right side rather than looping.

the tricky part: building the AST

here's what nobody tells you about the iterative rewrite. the grammar is flat now — it produces a sequence of terms and operators. but the AST needs to be a tree. a + b + c should produce a tree like BinaryExpr(BinaryExpr(a, +, b), +, c), left-associated.

in Odin the parsing procedure collects the flat sequence and folds it left:

parse_additive :: proc(p: ^Parser) -> ^Node {
    left := parse_multiplicative(p)

    for is_additive_op(p.current_token) {
        op := p.current_token
        advance(p)
        right := parse_multiplicative(p)
        left = make_binary_node(op, left, right)
    }

    return left
}

the for loop is doing the work that left recursion would have done. each iteration wraps the accumulated left with a new binary node. after parsing a + b + c, the first iteration builds BinaryExpr(a, +, b) and assigns it to left. the second iteration builds BinaryExpr(BinaryExpr(a, +, b), +, c). correct left-association falls out of the loop structure naturally.

right-associative operators like assignment need the opposite — you recurse on the right side instead of looping:

parse_assignment :: proc(p: ^Parser) -> ^Node {
    left := parse_conditional(p)

    if is_assignment_op(p.current_token) {
        op := p.current_token
        advance(p)
        right := parse_assignment(p)  // recurse, don't loop
        return make_assignment_node(op, left, right)
    }

    return left
}

this is right recursion, which PEG handles fine. the parser only recurses when it's already consumed some input (the left-hand side and the operator), so it always makes progress.

the general lesson

left recursion in PEG isn't a limitation you work around once and forget. it shapes how you think about the entire grammar. every rule that the spec writes as left-recursive needs to become iterative, and every iterative rule needs a corresponding AST-building strategy that recovers the tree structure the language actually implies.

some PEG libraries try to handle left recursion automatically with memoization tricks (packrat parsing with left recursion support). i chose not to go that route because the iterative rewrite is explicit — i can look at the grammar and the parsing code and see exactly what's happening. no magic, just loops.