Sunday, April 5, 2026

Where Specs Work and Where They Don't

The litmus test for when golden values earn their keep — and when conventions are enough

After building spec-driven verification into two production Rails apps — one for compliance, one for accounting — I have a clear sense of where the pattern belongs and where it doesn't. The honest answer is about 30% of a typical application. But it's the 30% where bugs cost the most.

The Litmus Test

The full pattern — dimensions, golden values, reference calculator, three-way completeness check — requires four properties:

Deterministic. Same inputs, same outputs. A payroll calculation, a tax estimate, an invoice total. No randomness, no external dependencies that change the result.

Calculable. A reference calculator can independently derive the expected output. Pure Ruby, no ActiveRecord, just inputs and rules. If you can't write a standalone function that produces the answer, you can't verify the answer independently.

Enumerable. The dimensions have finite, discrete values. Transaction types, account categories, invoice states. Not continuous ranges, not infinite possibilities.

Objective. Correctness is provable, not a matter of taste. The IRS agrees the tax calculation is right, or it doesn't. The ledger balances, or it doesn't.

When any of these is missing, the pattern degrades.

Strong Fit

Compliance calculations. Labor law, tax code, accounting standards. The rules come from outside the software. The reference calculator IS the rules. The golden values ARE the regulatory expectations.

Financial math. Billing, invoicing, amortization. Deterministic arithmetic with edge cases at thresholds — the 1099 reporting threshold at $600, quarterly tax estimates, prepaid expenses spanning year boundaries.

State machines. Possibly the strongest fit. A workflow has finite states, finite transitions, and every invalid transition is provably wrong. The dimensional matrix maps states to transitions to outcomes. Coverage is visible. Gaps are obvious. Invoice lifecycle — draft, sent, partial, paid, overdue, written off — maps directly.

E-commerce totals. Cart plus payment plus discount plus shipping plus tax equals one correct number. The reference calculator derives it.

Partial Fit

Authorization. Dimensions are clear — roles, resources, actions. The matrix identifies every access decision. But the golden value is binary: allow or deny. No complex calculation. The value is in matrix completeness — did we define every combination? — not the three-way check.

API contracts. Status codes and response shapes are deterministic. Response bodies depend on database state. The pattern works for the contract layer. Not the data layer.

Doesn't Fit

Authentication. Use a gem. Use the Rails 8 generator. Specifying it with golden values adds overhead to something that already has a correct implementation.

UI and UX. "Does this look right?" isn't calculable. System tests verify the right screen appears. Golden values can't verify the design is good.

Pure CRUD. No transformation to verify. Rails already does this correctly.

Performance. Continuous metrics, not discrete values.

The Practical Boundary

The pattern belongs in the parts of your app that are uniquely yours — the business rules no gem implements because they're specific to your domain. The accounting math. The compliance engine. The pricing logic. The eligibility rules. The workflow state machine.

Everything else — auth, file uploads, background jobs, CRUD, UI — gets built from Rails conventions and gems. Standard tools for standard problems.

Strong fit              Partial fit           Doesn't fit
(full pattern)          (dimensions only)     (use conventions)

Compliance math         Authorization         Authentication
Financial calculations  API contracts         UI/UX
State machines                                Pure CRUD
E-commerce totals                             Performance
Insurance claims                              Creative tools
Tax computation

The 30% that passes the litmus test is where miscalculations create legal liability, where edge cases cost real money, where "it worked when I tested it manually" isn't acceptable. Those are the parts worth golden values, reference calculators, and completeness checks.

The other 70% is fine with regular tests and good conventions. Knowing which parts need the full pattern — and which don't — is what makes it sustainable instead of exhausting.

This is part of a series on spec-driven development with AI agents. Previous: Bugs Are Missing Scenarios. Start: Agentic Engineering Is Pattern Engineering.

Back to insights

Book & App — Launching September 2026

Without Expectation

Debugging Life's Complex Systems

The same systematic approach engineers use to debug complex systems — applied to the complex system of your life. Learn to observe without judgment, distinguish symptoms from root causes, and run small experiments that compound into massive change.

23 chapters
AI prompt templates
iOS companion app
Print, digital & audio

Visit the Book Site Learn More

If you liked this, you might also like...

Bugs Are Missing Scenarios

In spec-driven development, the code isn't wrong — it was never told what to do for that combination of inputs. The fix is always the same: add the scenario.

Humans Think in Dimensions, Not Test Cases

Nobody sits down and writes fifty test scenarios from scratch. They describe how the world varies — and the scenarios are the intersections.

Where Specs Work and Where They Don't

The Litmus Test

Strong Fit

Partial Fit

Doesn't Fit

The Practical Boundary

Content

Company

Connect