Technology

How Reflex Achieved a 220x Speedup on Python's AST Traversal

Martin HollowayPublished 15h ago4 min readBased on 1 source
Reading level
How Reflex Achieved a 220x Speedup on Python's AST Traversal

Reflex published a technical post on 16 June 2026 detailing how the team achieved a 220x speedup over Python's standard ast.walk implementation. The journey to that result is worth understanding, because it shows how real performance work unfolds.

The company's starting point was straightforward: volume. Reflex compiles Python into full-stack web applications, which means it generates substantial quantities of Python Abstract Syntax Trees (ASTs — the parsed structure of code) at build time. When that generated code grows large enough, ast.walk — the standard library's general-purpose tool for traversing these trees — became a measurable bottleneck.

A first optimization attempt yielded only a 5% gain. That figure is instructive. A minor improvement like that typically signals that the easy wins — tighter loops, reduced lookups, small algorithmic tweaks — have been exhausted. Getting from 5% to 220x requires rethinking the problem more fundamentally: usually by changing how data is structured or how the tree is traversed. Reflex's blog post does not elaborate on the specific technique based on available information, but a gain of that magnitude points to an algorithmic or structural change, not micro-optimization.

Understanding why this matters beyond Reflex requires some context. ast.walk is used across Python tooling — linters, formatters, transpilers, static analyzers, and code-generation frameworks. The standard implementation performs a breadth-first traversal using a deque (a data structure that lets you add or remove items from either end), yielding each node as it goes. It is clean, general, and correct. It is also allocation-heavy, because it maintains external state rather than exploiting the specific shape of the AST being walked. Projects that process ASTs once per file — say, when a developer runs a linter — rarely feel this cost. Projects that generate and traverse ASTs programmatically at high frequency, like Reflex, hit the ceiling much sooner.

The pattern Reflex followed — identify bottleneck, try the obvious fix, watch it fail, then pursue a deeper structural solution — is how most serious performance work actually goes. Early wins are quick. Large multiples require understanding why the naive approach is slow at a level below surface behavior.

Whether the Reflex approach becomes a standalone library or a contribution to CPython itself is not yet addressed. Both paths have precedent. The orjson project followed a similar trajectory from internal optimization to widely adopted replacement, and libraries like libcst and astroid emerged from teams who needed more than the standard library offered. A 220x speedup sustained under real production conditions would likely attract attention from the Python tooling community, particularly teams running large-scale static analysis where traversal cost compounds quickly.

The verified facts are limited in scope. Reflex has published its findings, the motivation was generated-code volume, and the eventual gain substantially exceeds the initial attempt. What remains unpublished, based on available information, is methodological detail: which traversal strategy, how the benchmark was designed, and whether the speedup holds across different tree shapes or is optimized for the ASTs Reflex specifically generates. Those details matter for whether the technique would work in other contexts. A 220x gain against Reflex's own generated ASTs may or may not translate to the general case.

That caveat aside, the result is notable. For Python developers working on code-generation pipelines or large-scale static analysis, it is a signal worth tracking — both for the technique itself, once documented in full, and for the broader point that standard library traversal utilities do not represent the performance ceiling for AST work.