How It Works

Diff Guardian runs a 4-phase pipeline every time you invoke it. Each phase feeds its output to the next, forming a lazy evaluation graph that only does expensive work when breaking changes exist.

Pipeline Overview

Git Diff Parser

Extracts old and new source code for every changed file between two git refs.

AST Mapper

Parses both old and new source into concrete syntax trees using WASM-compiled Tree-Sitter grammars. Extracts structured signatures.

Classifier Engine

Compares old vs new signatures using bucketed classification rules. Assigns severity: breaking, warning, or safe.

Call-Site Tracer

For breaking changes only: scans the repo for importers, then counts arguments at each call site. Shows exactly which callers are broken.

Phase 1: Git Diff Parser

The pipeline begins by running git diff between two refs. For each changed file, it extracts the full source code from both the base and head commits — not just the diff hunks.

Why full source? Because AST parsing requires complete, parseable files. A diff hunk in isolation is not valid syntax. By extracting full source from both sides, the AST mapper can build complete syntax trees.

// Internally, the parser runs equivalent to:
git show <base>:<filepath>   // → old source
git show <head>:<filepath>   // → new source

The parser also handles special cases:

New files — old source is empty, new source is the complete file
Deleted files — old source is the complete file, new source is empty
Working tree mode — reads the file from disk instead of git
Staged mode — reads from the git index via git show :0:filepath

Phase 2: AST Mapper

The AST Mapper is the orchestrator. It receives the file diffs, determines the language from the file extension, loads the correct WASM grammar, and dispatches to the appropriate language translator.

Each language has a dedicated translator module:

Language	Translator	Grammar
TypeScript / JavaScript	`translators/typescript.ts`	`tree-sitter-typescript.wasm`
Python	`translators/python.ts`	`tree-sitter-python.wasm`
Go	`translators/go.ts`	`tree-sitter-go.wasm`
Java	`translators/java.ts`	`tree-sitter-java.wasm`
Rust	`translators/rust.ts`	`tree-sitter-rust.wasm`

Each translator extracts four types of signatures from the syntax tree:

FunctionSignature — name, parameters (name, type, optional, default, rest), return type, async, visibility, generics
InterfaceSignature — name, properties (name, type, optional), generics
EnumSignature — name, members (name, value)
TypeAliasSignature — name, type expression, generics

Grammars are lazy-loaded and cached. If 10 TypeScript files appear in a diff, the WASM grammar is loaded exactly once. An in-flight deduplication map prevents thundering herd problems.

Phase 3: Classifier Engine

The classifier receives a ParseResult per file — containing the old and new signature maps. It iterates every key across both maps and applies classification logic:

Symbol deleted — key exists in old but not new. Always breaking.
Symbol added — key exists in new but not old. Always safe.
Symbol changed — key exists in both. Run through the rule engine.

For changed symbols, the engine performs a deepStrictEqual check first. If the signatures are identical, no rules run — this is a massive performance shortcut for files where only implementation (not API surface) changed.

If signatures differ, the engine routes to pre-computed rule buckets based on symbol type. Function signatures run through function rules, interface signatures through interface rules, and so on. This is O(1) routing, not O(n) filtering.

See dg rules for the full list of classification rules.

Phase 4: Call-Site Tracer

The tracer is the most expensive phase — and it only runs when breaking changes exist. This is the "lazy" part of the Lazy Graph Engine.

For each breaking change, the tracer performs two sub-phases:

Scanner (Phase 4a)

The JIT Scanner finds every file that imports the broken symbol. It usesgit grep for initial candidate discovery, then AST-parses import statements to confirm actual usage. It handles:

Named imports: import { processPayment } from './api'
Default imports: import processPayment from './api'
Aliased imports: import { processPayment as pay } from './api'
Barrel re-exports: follows index.ts chains up to 10 levels deep

Tracer (Phase 4b)

For each importer file, the tracer AST-parses the file and locates every call expression of the broken symbol. Depending on the symbol type:

Functions— counts the arguments at each call site and compares against the new signature's required and total parameter counts. Reports broken, fixed, or indeterminate (spread args) status.
Enums — finds EnumName.MemberName access patterns and checks if the accessed member was removed or had its value changed.

Reporters

After the pipeline completes, one of three reporters renders the output:

Terminal Reporter — colorized output for local CLI usage and git hooks
GitHub Reporter — posts a structured comment on the PR via the GitHub API
JSON Reporter — writes structured JSON to stdout or a file for programmatic consumption

Next steps

Architecture Deep Dive — source tree map and data contracts
Classification Rules — all rules explained with examples
dg trace — standalone symbol tracing