Architecture
A deep dive into the source tree, data contracts, and design decisions behind Diff Guardian.
Source tree
src/
cli.ts # Entry point — command router
pipeline.ts # Orchestrates the 4-phase pipeline
config.ts # dg.config.json loader
core/
types.ts # All TypeScript interfaces and type aliases
constants.ts # Extension maps, supported languages
parsers/
git-diff.ts # Phase 1: git diff → FileDiff[]
ast-mapper.ts # Phase 2: FileDiff[] → ParseResult[]
translators/
typescript.ts # TS/JS/TSX/JSX translator
python.ts # Python translator
go.ts # Go translator
java.ts # Java translator
rust.ts # Rust translator
classifier/
engine.ts # Phase 3: ParseResult[] → FunctionChange[]
types.ts # Rule contract interface
rules/
R01_param_removed.ts # ...through R28_exported.ts
index.ts # Barrel export of all rules
tracer/
index.ts # Phase 4: JIT Scanner + Call-Site Tracer
reporter/
types.ts # ReporterConfig interface
terminal.ts # Terminal reporter (colorized output)
github.ts # GitHub PR comment reporter
index.ts # Reporter factory
grammars/ # WASM files (tree-sitter-*.wasm)
.husky/ # Git hook scriptsData contracts
Data flows through the pipeline as strongly-typed TypeScript interfaces. Each phase consumes the output of the previous phase:
FileDiff (Phase 1 output)
interface FileDiff {
path: string; // Relative file path
language: string; // File extension (e.g., "ts", "py")
oldSource: string; // Full source from base ref
newSource: string; // Full source from head ref
}ParseResult (Phase 2 output)
interface ParseResult {
file: string;
language: Language;
oldSigs: Map<string, AnySignature>; // Base signatures
newSigs: Map<string, AnySignature>; // Head signatures
skipped: boolean;
skipReason?: string;
}FunctionChange (Phase 3 output)
interface FunctionChange {
id: string;
name: string;
file: string;
lineStart: number;
lineEnd: number;
language: Language;
symbolType: 'function' | 'interface' | 'enum' | 'type_alias';
severity: 'breaking' | 'warning' | 'safe';
changeType: ChangeType;
breaking: boolean;
message: string;
before: AnySignature | null;
after: AnySignature | null;
callers: CallerInfo[];
}Classifier design
The classifier engine uses a technique called bucketed rule routing. Instead of iterating all 28 rules for every changed symbol, it pre-computes four rule buckets at startup — one per symbol type (function, interface, enum, type_alias). When a symbol change is detected, only the rules in the matching bucket are executed:
// Pre-computed once per file
const buckets = {
function: rules.filter(r => r.target === 'function'),
interface: rules.filter(r => r.target === 'interface'),
enum: rules.filter(r => r.target === 'enum'),
type_alias: rules.filter(r => r.target === 'type_alias'),
};
// Per-symbol: O(1) lookup + iterate only relevant rules
if (key.startsWith('interface:')) {
rulesToRun = buckets.interface;
}WASM grammar lifecycle
The AST Mapper manages WASM grammar loading with three guarantees:
- Lazy loading — grammars are loaded only when needed. If a diff contains only TypeScript files, Go/Python/Java/Rust grammars are never loaded.
- Deduplication — if 10
.tsfiles appear in a diff, the TypeScript grammar is loaded exactly once. An in-flight promise map prevents thundering herd. - Memory safety — every parsed tree is freed in a
finallyblock viatree.delete(), even if the translator throws. This prevents WASM heap leaks.
Sequential parsing
Files are parsed sequentially, not concurrently. This is intentional:
- Tree-sitter parses at approximately 100,000 lines per second — concurrent parsing adds negligible throughput improvement
- Sequential processing keeps WASM heap usage flat and deterministic
- Error traces are cleaner when failures happen in a known order
Tracer architecture
The call-site tracer is split into two components:
JIT Scanner
Uses git grep for O(repo) candidate discovery, then AST-parses import statements to confirm actual usage. Handles barrel re-exports by following index.ts chains recursively.
Call-Site Counter
For each confirmed importer, parses the file and locates call expressions of the broken symbol. Counts arguments and compares against the new signature to determine "broken", "fixed", or "indeterminate" status.
Related
- How It Works — high-level pipeline overview
- Language Support — per-language translator details
- Classification Rules — all 28 rules