Architecture

A deep dive into the source tree, data contracts, and design decisions behind Diff Guardian.


Source tree

src/
  cli.ts                      # Entry point — command router
  pipeline.ts                 # Orchestrates the 4-phase pipeline
  config.ts                   # dg.config.json loader

  core/
    types.ts                  # All TypeScript interfaces and type aliases
    constants.ts              # Extension maps, supported languages

  parsers/
    git-diff.ts               # Phase 1: git diff → FileDiff[]
    ast-mapper.ts             # Phase 2: FileDiff[] → ParseResult[]
    translators/
      typescript.ts           # TS/JS/TSX/JSX translator
      python.ts               # Python translator
      go.ts                   # Go translator
      java.ts                 # Java translator
      rust.ts                 # Rust translator

  classifier/
    engine.ts                 # Phase 3: ParseResult[] → FunctionChange[]
    types.ts                  # Rule contract interface
    rules/
      R01_param_removed.ts    # ...through R28_exported.ts
      index.ts                # Barrel export of all rules

  tracer/
    index.ts                  # Phase 4: JIT Scanner + Call-Site Tracer

  reporter/
    types.ts                  # ReporterConfig interface
    terminal.ts               # Terminal reporter (colorized output)
    github.ts                 # GitHub PR comment reporter
    index.ts                  # Reporter factory

grammars/                     # WASM files (tree-sitter-*.wasm)
.husky/                       # Git hook scripts

Data contracts

Data flows through the pipeline as strongly-typed TypeScript interfaces. Each phase consumes the output of the previous phase:

FileDiff (Phase 1 output)

interface FileDiff {
  path: string;        // Relative file path
  language: string;    // File extension (e.g., "ts", "py")
  oldSource: string;   // Full source from base ref
  newSource: string;   // Full source from head ref
}

ParseResult (Phase 2 output)

interface ParseResult {
  file: string;
  language: Language;
  oldSigs: Map<string, AnySignature>;  // Base signatures
  newSigs: Map<string, AnySignature>;  // Head signatures
  skipped: boolean;
  skipReason?: string;
}

FunctionChange (Phase 3 output)

interface FunctionChange {
  id: string;
  name: string;
  file: string;
  lineStart: number;
  lineEnd: number;
  language: Language;
  symbolType: 'function' | 'interface' | 'enum' | 'type_alias';
  severity: 'breaking' | 'warning' | 'safe';
  changeType: ChangeType;
  breaking: boolean;
  message: string;
  before: AnySignature | null;
  after: AnySignature | null;
  callers: CallerInfo[];
}

Classifier design

The classifier engine uses a technique called bucketed rule routing. Instead of iterating all 28 rules for every changed symbol, it pre-computes four rule buckets at startup — one per symbol type (function, interface, enum, type_alias). When a symbol change is detected, only the rules in the matching bucket are executed:

// Pre-computed once per file
const buckets = {
  function:   rules.filter(r => r.target === 'function'),
  interface:  rules.filter(r => r.target === 'interface'),
  enum:       rules.filter(r => r.target === 'enum'),
  type_alias: rules.filter(r => r.target === 'type_alias'),
};

// Per-symbol: O(1) lookup + iterate only relevant rules
if (key.startsWith('interface:')) {
  rulesToRun = buckets.interface;
}

WASM grammar lifecycle

The AST Mapper manages WASM grammar loading with three guarantees:

  1. Lazy loading — grammars are loaded only when needed. If a diff contains only TypeScript files, Go/Python/Java/Rust grammars are never loaded.
  2. Deduplication — if 10 .ts files appear in a diff, the TypeScript grammar is loaded exactly once. An in-flight promise map prevents thundering herd.
  3. Memory safety — every parsed tree is freed in afinally block via tree.delete(), even if the translator throws. This prevents WASM heap leaks.

Sequential parsing

Files are parsed sequentially, not concurrently. This is intentional:

  • Tree-sitter parses at approximately 100,000 lines per second — concurrent parsing adds negligible throughput improvement
  • Sequential processing keeps WASM heap usage flat and deterministic
  • Error traces are cleaner when failures happen in a known order

Tracer architecture

The call-site tracer is split into two components:

JIT Scanner

Uses git grep for O(repo) candidate discovery, then AST-parses import statements to confirm actual usage. Handles barrel re-exports by following index.ts chains recursively.

Call-Site Counter

For each confirmed importer, parses the file and locates call expressions of the broken symbol. Counts arguments and compares against the new signature to determine "broken", "fixed", or "indeterminate" status.