How It Works
Diff Guardian runs a 4-phase pipeline every time you invoke it. Each phase feeds its output to the next, forming a lazy evaluation graph that only does expensive work when breaking changes exist.
Pipeline Overview
Extracts old and new source code for every changed file between two git refs.
Parses both old and new source into concrete syntax trees using WASM-compiled Tree-Sitter grammars. Extracts structured signatures.
Compares old vs new signatures using bucketed classification rules. Assigns severity: breaking, warning, or safe.
For breaking changes only: scans the repo for importers, then counts arguments at each call site. Shows exactly which callers are broken.
Phase 1: Git Diff Parser
The pipeline begins by running git diff between two refs. For each changed file, it extracts the full source code from both the base and head commits — not just the diff hunks.
Why full source? Because AST parsing requires complete, parseable files. A diff hunk in isolation is not valid syntax. By extracting full source from both sides, the AST mapper can build complete syntax trees.
// Internally, the parser runs equivalent to:
git show <base>:<filepath> // → old source
git show <head>:<filepath> // → new sourceThe parser also handles special cases:
- New files — old source is empty, new source is the complete file
- Deleted files — old source is the complete file, new source is empty
- Working tree mode — reads the file from disk instead of git
- Staged mode — reads from the git index via
git show :0:filepath
Phase 2: AST Mapper
The AST Mapper is the orchestrator. It receives the file diffs, determines the language from the file extension, loads the correct WASM grammar, and dispatches to the appropriate language translator.
Each language has a dedicated translator module:
| Language | Translator | Grammar |
|---|---|---|
| TypeScript / JavaScript | translators/typescript.ts | tree-sitter-typescript.wasm |
| Python | translators/python.ts | tree-sitter-python.wasm |
| Go | translators/go.ts | tree-sitter-go.wasm |
| Java | translators/java.ts | tree-sitter-java.wasm |
| Rust | translators/rust.ts | tree-sitter-rust.wasm |
Each translator extracts four types of signatures from the syntax tree:
- FunctionSignature — name, parameters (name, type, optional, default, rest), return type, async, visibility, generics
- InterfaceSignature — name, properties (name, type, optional), generics
- EnumSignature — name, members (name, value)
- TypeAliasSignature — name, type expression, generics
Grammars are lazy-loaded and cached. If 10 TypeScript files appear in a diff, the WASM grammar is loaded exactly once. An in-flight deduplication map prevents thundering herd problems.
Phase 3: Classifier Engine
The classifier receives a ParseResult per file — containing the old and new signature maps. It iterates every key across both maps and applies classification logic:
- Symbol deleted — key exists in old but not new. Always breaking.
- Symbol added — key exists in new but not old. Always safe.
- Symbol changed — key exists in both. Run through the rule engine.
For changed symbols, the engine performs a deepStrictEqual check first. If the signatures are identical, no rules run — this is a massive performance shortcut for files where only implementation (not API surface) changed.
If signatures differ, the engine routes to pre-computed rule buckets based on symbol type. Function signatures run through function rules, interface signatures through interface rules, and so on. This is O(1) routing, not O(n) filtering.
See dg rules for the full list of classification rules.
Phase 4: Call-Site Tracer
The tracer is the most expensive phase — and it only runs when breaking changes exist. This is the "lazy" part of the Lazy Graph Engine.
For each breaking change, the tracer performs two sub-phases:
Scanner (Phase 4a)
The JIT Scanner finds every file that imports the broken symbol. It usesgit grep for initial candidate discovery, then AST-parses import statements to confirm actual usage. It handles:
- Named imports:
import { processPayment } from './api' - Default imports:
import processPayment from './api' - Aliased imports:
import { processPayment as pay } from './api' - Barrel re-exports: follows
index.tschains up to 10 levels deep
Tracer (Phase 4b)
For each importer file, the tracer AST-parses the file and locates every call expression of the broken symbol. Depending on the symbol type:
- Functions— counts the arguments at each call site and compares against the new signature's required and total parameter counts. Reports broken, fixed, or indeterminate (spread args) status.
- Enums — finds
EnumName.MemberNameaccess patterns and checks if the accessed member was removed or had its value changed.
Reporters
After the pipeline completes, one of three reporters renders the output:
- Terminal Reporter — colorized output for local CLI usage and git hooks
- GitHub Reporter — posts a structured comment on the PR via the GitHub API
- JSON Reporter — writes structured JSON to stdout or a file for programmatic consumption
Next steps
- Architecture Deep Dive — source tree map and data contracts
- Classification Rules — all rules explained with examples
- dg trace — standalone symbol tracing