Phase 2: AST Mapper
The AST Mapper is the orchestrator of the parsing layer. It receives raw source strings from Phase 1, loads WASM-compiled Tree-Sitter grammars, parses each file into a concrete syntax tree, and dispatches to language-specific translators that extract structured signatures.
Responsibilities
- WASM grammar lifecycle — lazy load, deduplicate, cache
- Sequential parsing — keeps WASM heap usage flat
- Memory safety —
tree.delete()in finally blocks - Error isolation — one bad file never aborts the entire run
- filePath injection — the one place that knows both filename and signature
WASM grammar loading
Tree-Sitter grammars are compiled to WebAssembly and loaded at runtime. The AST Mapper manages this with three guarantees:
Lazy loading
Grammars are loaded only when the first file of that language appears in the diff. If a PR only changes TypeScript files, the Python, Go, Java, and Rust grammars are never loaded.
Deduplication (thundering herd prevention)
If 10 .ts files appear in a diff, the TypeScript grammar is loaded exactly once. An in-flight promise map prevents concurrent load attempts:
private async getLanguage(code: string): Promise<WasmLanguage> {
// Already loaded — return instantly
if (this.languages.has(code)) {
return this.languages.get(code)!;
}
// Currently loading — wait for the in-flight promise
// Prevents thundering herd: 10 .ts files = 1 WASM load
if (this.loadingLanguages.has(code)) {
return this.loadingLanguages.get(code)!;
}
// First request — start the load
const loadPromise = this.loadGrammar(code).finally(() => {
this.loadingLanguages.delete(code);
});
this.loadingLanguages.set(code, loadPromise);
return loadPromise;
}Grammar caching
Once loaded, grammars are cached in memory for the lifetime of the process. In CI, this means one load per run. Locally with --watch mode, grammars persist across re-runs.
Parsing process
For each FileDiff, the mapper:
- Resolves the file extension to a language code
- Loads (or retrieves cached) the WASM grammar
- Swaps the grammar on the shared parser instance
- Parses old source → tree → extract signatures
- Parses new source → tree → extract signatures
- Returns a
ParseResultwith both signature maps
Sequential processing
Files are parsed sequentially, not concurrently. This is a deliberate design choice:
- Tree-sitter parses at ~100,000 lines/sec — concurrency adds negligible throughput
- Sequential processing keeps WASM heap allocation flat and deterministic
- Error traces are cleaner when failures occur in a known order
- One shared parser instance avoids multiple WASM runtime costs
Memory safety
Every parsed tree is freed in a finally block via tree.delete(). This is critical — Tree-Sitter trees are allocated on the WASM heap, not the JavaScript heap. The garbage collector cannot reclaim them. Without explicit deletion, the WASM heap grows without bound.
let tree: Tree | null = null;
try {
tree = this.parser!.parse(source);
if (!tree) return new Map();
// If the file has syntax errors, warn but continue
if (tree.rootNode.hasError) {
console.warn(`Parse errors in "${filePath}" — signatures may be incomplete`);
}
return this.dispatch(tree, ext, lang);
} finally {
// CRITICAL: always free WASM-allocated tree memory.
// Runs even if dispatch() throws.
tree?.delete();
}Language translators
Each language has a dedicated translator module responsible for walking the concrete syntax tree and extracting signatures:
| Language | Extensions | Grammar | Translator |
|---|---|---|---|
| TypeScript / JavaScript | .ts, .tsx, .js, .jsx | tree-sitter-typescript | translators/typescript.ts |
| Python | .py | tree-sitter-python | translators/python.ts |
| Go | .go | tree-sitter-go | translators/go.ts |
| Java | .java | tree-sitter-java | translators/java.ts |
| Rust | .rs | tree-sitter-rust | translators/rust.ts |
JavaScript files use the TypeScript grammar because TypeScript is a syntactic superset — all valid JS is valid TS.
Extracted signatures
Each translator extracts four types of API signatures:
interface FunctionSignature {
name: string;
params: ParamSignature[]; // name, type, optional, default, rest
returnType: string;
async: boolean;
exported: boolean;
isStatic: boolean;
isConstructor: boolean;
className?: string;
visibility?: 'public' | 'protected' | 'private';
generics: TypeParameter[];
overloadIndex?: number;
filePath: string; // Injected by ASTMapper
line: number;
}
interface InterfaceSignature {
name: string;
properties: PropertySignature[]; // name, type, optional
generics: TypeParameter[];
line: number;
}
interface EnumSignature {
name: string;
members: EnumMember[]; // name, value
line: number;
}
interface TypeAliasSignature {
name: string;
typeExpression: string;
generics: TypeParameter[];
line: number;
}Output: ParseResult[]
Each file produces a ParseResult containing signature maps for both old and new source. The classifier (Phase 3) uses the key to determine what changed:
interface ParseResult {
file: string; // "src/api/payments.ts"
language: Language; // "typescript"
oldSigs: Map<string, AnySignature>; // signatures from base ref
newSigs: Map<string, AnySignature>; // signatures from head ref
skipped: boolean; // true if parsing failed
skipReason?: string; // reason for skipping
}Map keys use a prefix convention for type routing:
processPayment— function (no prefix)interface:UserConfig— interfaceenum:PaymentStatus— enumtype:ConfigOptions— type alias
Next phase
The ParseResult[] array is passed to Phase 3: Classifier Engine, which compares old vs new signatures and assigns severity.