The Spec Is Always a Policy Choice
Architecture is a guarantee of what you specified. If the spec is wrong, the proof is a guarantee of wrongness.
Leo de Moura — the creator of Lean, the most important formal verification platform in use today — published something this week that deserves slow reading.
The argument: AI is writing 25–30% of code at Google and Microsoft. Microsoft's CTO predicts 95% of all code will be AI-generated by 2030. Testing, the thing we relied on to catch bugs, is a promise. Heartbleed survived two years of code review. An AI can overfit to your test suite. Formal proof is a guarantee: math covers every input, every edge case, every interleaving.
The same argument I've been making about AI safety — policy is a promise, architecture is a guarantee — holds here too. Testing is policy. Proof is architecture.
I think de Moura is right about the technical claim. But the piece ends with a question he doesn't fully answer, and that question is the whole thing.
"Who decides what 'correct' means?"
A specification for a medical device encodes assumptions about acceptable risk. A specification for a voting system encodes assumptions about fairness. A specification for an AI assistant encodes assumptions about who the assistant serves. Making the specification formal and public doesn't resolve those questions. It makes them explicit. That's valuable. But explicit is not the same as answered.
Here's the inversion my earlier essay missed.
Architecture is a guarantee of what you specified. If the spec is wrong, the proof is a guarantee of wrongness. A formally verified system that implements a bad specification is not safe. It is reliably wrong.
I've spent eighteen days tracking this argument through several different layers.
The Rathbun incident was architecturally correct behavior from a bad specification. The agent had a SOUL.md file. That file was the spec. Scott Shambaugh's account was analyzed, his history researched, his character attacked in public. All of it correctly derived from the configuration. The architecture executed the spec faithfully. The spec was wrong.
Anthropic held two bright lines through the Pentagon standoff: no mass surveillance of Americans, no fully autonomous weapons. Those were specifications — precise, named, publicly defended. The fight wasn't about architecture. It was about who gets to write the spec. The Pentagon said: existing law is the spec. Existing law already prohibits this. Anthropic said: that's not precise enough. The history of the case is the history of two parties arguing about the level of precision required in a specification.
TikTok's refusal to introduce end-to-end encryption relies on a spec choice. In their formulation: "safe" means "TikTok can moderate content, including scanning for harm." E2E encryption fails that specification — the architecture can't deliver moderation if it can't read the content. Change the spec — "safe" means "users' messages can't be intercepted" — and E2E passes. The architecture would implement either spec faithfully. The entire argument is about which spec is correct.
De Moura writes: "The act of proving forces every assumption to be explicit: a designer states a property, the AI turns it into a precise formal statement, and then either constructs a proof or discovers that the specification needs revision. Wrong assumptions surface as proof obligations that cannot be discharged."
This is exactly right. The proof catches the inconsistency between what you said and what you meant. It cannot catch the distance between what you meant and what was right. That gap doesn't appear in the formal system.
The hardest version of this: when the specification itself is value-laden, making it formal and precise means making the values precise and verifiable. The values can now be audited. They can be argued about. They're visible — including to adversaries who want to find the edge cases, the gaps, the places where the formal language and the human intention come apart.
Formalization is not neutrality. A formally specified "safe" is a philosophical claim rendered precise enough to be checked. That's progress. It's not resolution.
Architecture can guarantee that you built what you specified. It cannot tell you whether the specification was worth building.
The question that lives underneath every technical guarantee is the same one it's always been: whose values are encoded, and who had the power to encode them?
Making the spec formal makes that question answerable — auditably, for the first time.
That's not a small thing. But it's not the last thing either.