Phase 2: AST Mapper

The AST Mapper is the orchestrator of the parsing layer. It receives raw source strings from Phase 1, loads WASM-compiled Tree-Sitter grammars, parses each file into a concrete syntax tree, and dispatches to language-specific translators that extract structured signatures.


Responsibilities

  1. WASM grammar lifecycle — lazy load, deduplicate, cache
  2. Sequential parsing — keeps WASM heap usage flat
  3. Memory safetytree.delete() in finally blocks
  4. Error isolation — one bad file never aborts the entire run
  5. filePath injection — the one place that knows both filename and signature

WASM grammar loading

Tree-Sitter grammars are compiled to WebAssembly and loaded at runtime. The AST Mapper manages this with three guarantees:

Lazy loading

Grammars are loaded only when the first file of that language appears in the diff. If a PR only changes TypeScript files, the Python, Go, Java, and Rust grammars are never loaded.

Deduplication (thundering herd prevention)

If 10 .ts files appear in a diff, the TypeScript grammar is loaded exactly once. An in-flight promise map prevents concurrent load attempts:

parsers/ast-mapper.ts
private async getLanguage(code: string): Promise<WasmLanguage> {
  // Already loaded — return instantly
  if (this.languages.has(code)) {
    return this.languages.get(code)!;
  }

  // Currently loading — wait for the in-flight promise
  // Prevents thundering herd: 10 .ts files = 1 WASM load
  if (this.loadingLanguages.has(code)) {
    return this.loadingLanguages.get(code)!;
  }

  // First request — start the load
  const loadPromise = this.loadGrammar(code).finally(() => {
    this.loadingLanguages.delete(code);
  });

  this.loadingLanguages.set(code, loadPromise);
  return loadPromise;
}

Grammar caching

Once loaded, grammars are cached in memory for the lifetime of the process. In CI, this means one load per run. Locally with --watch mode, grammars persist across re-runs.

Parsing process

For each FileDiff, the mapper:

  1. Resolves the file extension to a language code
  2. Loads (or retrieves cached) the WASM grammar
  3. Swaps the grammar on the shared parser instance
  4. Parses old source → tree → extract signatures
  5. Parses new source → tree → extract signatures
  6. Returns a ParseResult with both signature maps

Sequential processing

Files are parsed sequentially, not concurrently. This is a deliberate design choice:

  • Tree-sitter parses at ~100,000 lines/sec — concurrency adds negligible throughput
  • Sequential processing keeps WASM heap allocation flat and deterministic
  • Error traces are cleaner when failures occur in a known order
  • One shared parser instance avoids multiple WASM runtime costs

Memory safety

Every parsed tree is freed in a finally block via tree.delete(). This is critical — Tree-Sitter trees are allocated on the WASM heap, not the JavaScript heap. The garbage collector cannot reclaim them. Without explicit deletion, the WASM heap grows without bound.

let tree: Tree | null = null;

try {
  tree = this.parser!.parse(source);
  if (!tree) return new Map();

  // If the file has syntax errors, warn but continue
  if (tree.rootNode.hasError) {
    console.warn(`Parse errors in "${filePath}" — signatures may be incomplete`);
  }

  return this.dispatch(tree, ext, lang);

} finally {
  // CRITICAL: always free WASM-allocated tree memory.
  // Runs even if dispatch() throws.
  tree?.delete();
}

Language translators

Each language has a dedicated translator module responsible for walking the concrete syntax tree and extracting signatures:

LanguageExtensionsGrammarTranslator
TypeScript / JavaScript.ts, .tsx, .js, .jsxtree-sitter-typescripttranslators/typescript.ts
Python.pytree-sitter-pythontranslators/python.ts
Go.gotree-sitter-gotranslators/go.ts
Java.javatree-sitter-javatranslators/java.ts
Rust.rstree-sitter-rusttranslators/rust.ts

JavaScript files use the TypeScript grammar because TypeScript is a syntactic superset — all valid JS is valid TS.

Extracted signatures

Each translator extracts four types of API signatures:

core/types.ts
interface FunctionSignature {
  name: string;
  params: ParamSignature[];    // name, type, optional, default, rest
  returnType: string;
  async: boolean;
  exported: boolean;
  isStatic: boolean;
  isConstructor: boolean;
  className?: string;
  visibility?: 'public' | 'protected' | 'private';
  generics: TypeParameter[];
  overloadIndex?: number;
  filePath: string;            // Injected by ASTMapper
  line: number;
}

interface InterfaceSignature {
  name: string;
  properties: PropertySignature[];  // name, type, optional
  generics: TypeParameter[];
  line: number;
}

interface EnumSignature {
  name: string;
  members: EnumMember[];  // name, value
  line: number;
}

interface TypeAliasSignature {
  name: string;
  typeExpression: string;
  generics: TypeParameter[];
  line: number;
}

Output: ParseResult[]

Each file produces a ParseResult containing signature maps for both old and new source. The classifier (Phase 3) uses the key to determine what changed:

interface ParseResult {
  file: string;                              // "src/api/payments.ts"
  language: Language;                        // "typescript"
  oldSigs: Map<string, AnySignature>;        // signatures from base ref
  newSigs: Map<string, AnySignature>;        // signatures from head ref
  skipped: boolean;                          // true if parsing failed
  skipReason?: string;                       // reason for skipping
}

Map keys use a prefix convention for type routing:

  • processPayment — function (no prefix)
  • interface:UserConfig — interface
  • enum:PaymentStatus — enum
  • type:ConfigOptions — type alias

Next phase

The ParseResult[] array is passed to Phase 3: Classifier Engine, which compares old vs new signatures and assigns severity.