Software architecture after AI

One of the more useful definitions of software architecture comes from Building Evolutionary Architectures: architecture is definitionally the stuff that’s hard to change.¹ I’ve always found this definition to be the most honest framing available, to say nothing of the simplest. It doesn’t pretend architecture is about beauty or correctness or your resident architect’s favorite stalking-horse. It acknowledges that what makes a decision “architectural” is not its conceptual weight but its cost to reverse and its business impact. And “hard to change” has always been, at root, about wall-clock time: coordination cost, incident mitigation, cognitive load, handoff friction. Software architecture has always been a labor problem dressed up as a design problem.

AI has now collapsed the wall-clock time required to make substantial code-level changes. Things that used to take months now take days. What happens to architecture when the cost to reverse most code-level decisions drops by something like an order of magnitude?²

What happens is that the boundary of what counts as software architecture moves, in some cases dramatically. Most code-level decisions are no longer inside it; their cost to reverse has collapsed, and the consequences of getting them wrong are measured in days of rework rather than months. What stays inside is data, service boundaries, and user trust, which remain hard because the hard part was never the code. And a few concerns crowded out by code-level debates now stand in sharper relief; observability in particular deserves a reconsideration in the pantheon as the rate of feature delivery increases.

When every line of code was clean, and real engineers refactored

For decades, code-level decisions were legitimately architectural. Languages, frameworks, module structures, and persistence strategies were decisions worth debating and committing to, because revisiting them could cost a staggering amount of time, and had long-term implications for the productivity of the team. Changing these things could cost months or even years of time and effort, and companies lived and died in the time it took large firms to reverse course. Even Refactoring was predicated on the idea that code-level change was possible but costly; restructuring code took skill and real time, and you needed techniques to manage that cost³.

But software practitioners have been collapsing architectural decisions into routine ones for decades. The effect of leaning into pain, rather than avoiding it is to incentivize teams to build tooling that addresses it, turning what used to be architecture into something a general-purpose engineer handles as a matter of course.

Before database migrations were commonplace, schema decisions were irreversible, and presumptively architectural; they often required DBAs to orchestrate them⁴. Then Pramod wrote Evolutionary Database Design, migrations got folded into every major framework, and the DBA role started to become less visible. The judgment and expertise they provided were real (and substantial), but its market value was inflated by the mechanical bottleneck. Once the bottleneck was removed, the costs of a dedicated gate became more visible and the judgment got absorbed into the general engineering role, which gave many teams more leverage. Continuous delivery did the same thing for deployment and release engineers. There may be no silver bullets (until now?), but each small shift in tool efficiency took a category of decision that used to require a specialist and made it mechanical. Each revealed that a specialized judgment was often general engineering skill trapped behind mechanical cost.

AI is simply the latest example of this, but it’s the most dramatic, because it collapses most remaining code-level decisions at once rather than one category at a time. A recent personal example: I wrote my (now-dead) startup against a NoSQL database whose vendor was also a startup, which (surprise, surprise) also died. I pointed Claude Code at it, gave it some guidance, and it ported the entire data layer to a conventional RDBMS, essentially flawlessly, in hours. I know this sort of thing has become commonplace, but it still surprised me: between the tedium of the work and my day job, I might never have accomplished it before the heat-death of the universe.

This is not an isolated example. Cloudflare’s team reimplemented 94% of the Next.js API surface in under a week for roughly $1,100 in API costs. Christopher Chedeau ported 100,000 lines of TypeScript to Rust in a month. Many of you have experienced similar shifts.

In some ways, these examples prove the rule that good structure matters: I built my data layer against a clean interface boundary, because I didn’t start writing code yesterday, so in some sense, of course swapping the implementation was straightforward. But even without a clean boundary, the change is fundamentally mechanical: find all the call sites, change all the implementations, verify correctness. More tokens, more time, and yeah, more human intervention, but we’re not talking about a vast difference; maybe days instead of hours. And the second-order effect of agentic development is that you can automate verification on top of it; you can build correctness-checking into the process itself.

These examples are admittedly biased toward the easy case: clean interfaces, well-defined boundaries, mechanically verifiable correctness. Systems with subtle semantics, unclear boundaries, and deeply entangled business logic remain harder to change, even with AI. But the trend line matters. AI does not erase coupling, migration risk, or rollout complexity, but it does demote a lot of code-shaping decisions that used to feel permanent, and make the rest dramatically easier to monitor and fix. That places a growing category of change squarely in the territory of “not architecture anymore.”

If code has moved outside the boundary, what’s still inside it?

It’s definitionally ridiculous to create a comprehensive taxonomy of software architecture, but I’ve broken out six categories below that I think capture the shape of the shift, and attempted to classify them along two axes: consequence of getting a decision wrong, and cost to reverse it. This is gut-level stuff, not hard data, so bear with me; I’m just trying to visualize the shift.

Decision	Status	Why
Local code structure	↓ Demoted	Modules, frameworks, persistence, integration wiring. AI makes mechanical restructuring cheap; getting it wrong now costs hours, not quarters.
Scalability and deployment posture	↓ Demoted	Infrastructure topology and performance strategy. Still harder than code, but within reach of routine engineering with AI-assisted tooling.
Data architecture	→ Still architectural	Ownership, consistency, schema evolution. Data has gravity; the hard part was never the code, and it hasn't meaningfully moved.
Trust and service boundaries	→ Still architectural	Security posture and the contracts downstream consumers depend on. Breaches are effectively irreversible; contracts bind organizations, not just systems.
Observability and behavioral verification	↑ Elevated	Code volume is up and comprehension is down. Verifying behavior is how you catch what you can no longer read.
Business strategy and capability alignment	↑ Elevated	Always high-consequence; now finally visible. With code-level debates cheaper, architects have headspace for the question the work exists to answer.

Three movements, for three distinct reasons.

Some decisions got demoted because AI collapsed the cost to reverse them. Local code structure (modules, frameworks, persistence, integration wiring) used to command serious architectural attention because reversing a bad decision could cost months of calendar time; they don’t have to anymore. Scalability and deployment posture followed: infrastructure topology and performance rework are still harder than ordinary code, but within reach of routine engineering with AI-assisted tooling. These decisions still require judgment, but that judgment is no longer trapped behind mechanical cost, and the consequences of a bad call are measured in hours and tokens rather than quarters and headcount.

Some decisions stayed put because the hard part was never the code. Data architecture (ownership, consistency models, schema evolution) didn’t move because data has gravity: it accumulates mass over time, and more things depend on its current shape than anyone can enumerate. Trust and service boundaries didn’t move either: security breaches and contract violations are effectively irreversible (though large corporations get away with a shocking amount of this), and reversing them requires coordinating with human beings, reshaping accumulated state, or undoing real-world consequences that code changes cannot reach.

Two decisions got elevated, for distinct reasons. Observability and behavioral verification rise because volume is rising: if defects per line stay constant but code volume quintuples, the consequence of failing to verify what the system actually does rises with it. Furthermore, if line-level comprehension similarly collapses (as in a dark software factory), you need to be able to verify what the system does regardless of whether you understand every line. The implementation of monitoring is cheap; the decision about what to watch and how to verify behavioral correctness is not. Business strategy and capability alignment, by contrast, didn’t move on the chart at all; they were always high-consequence, and reversing a strategic misstep has always been expensive. What changed is that architects finally have the headspace to engage with them. With code-level debates cheaper, the question of which boundaries create competitive advantage is no longer crowded out by framework arguments.

You could reasonably argue that code structure is the enforcement mechanism for precisely the things I’m calling architectural. Good module boundaries help enforce API contracts; good type systems help protect data invariants. If you stop caring about code structure, don’t you risk undermining the contracts you claim to care about? I think this confuses the decision with its implementation. The shape of the contract, and its guarantees, are the hard part; changing them costs real time, because you have to coordinate with human beings to do it. The code that enforces them is implementation, and implementation is now cheap. You can swap out the enforcement mechanism without touching the contract it enforces; that’s what my startup migration work did.

There’s a related objection worth engaging: mid-level design decisions (“should this business logic live in service A or service B?”) accumulate over time into the overall malleability of a system, and those accumulated decisions are genuinely hard to untangle. This is true, but it’s always been true. It was never centrally controllable in the first place: teams generally put shit where it seems like it should go, optimizing for local autonomy and throughput no matter how much you try to govern it centrally, and the result is always some degree of drift. What’s changed is that an LLM strapped to a codebase search index (which is rapidly becoming table stakes) can actually find all of it, reason about how it ended up there, and help you reorganize it. The accumulated impact of mid-level decisions, while still important and probably still architectural, is more tractable with AI, not less; the cost of untangling it has dropped, even if it hasn’t vanished.

Pattern amplification is not destiny

You will hear the counterargument that AI makes code quality more important, not less, because it amplifies both good and bad decisions at volume. Addy Osmani reports that AI-generated code has 75% more logic errors and 2.74x more XSS vulnerabilities than human-written code; PRs are 18% larger; change failure rates are up roughly 30%. Rachel Thomas at fast.ai argues that we’ve “automated coding, but not software engineering.” Osmani’s framing of comprehension debt is sharp: developers using AI scored 17% lower on comprehension quizzes, and “making code cheap to generate doesn’t make understanding cheap to skip.” But I think it locates the risk at the wrong layer.

AI can see within the light-cone of a context window, but it cannot yet see the whole product. It cannot tell you whether the feature you just shipped actually solves the problem your customer has, or whether the subtle behavioral regression you introduced last Tuesday is slowly eroding trust. This is what I’ve elsewhere called the accumulated ignorance of product debt, as distinct from tech debt: the gap between what the system does and what users need it to do.

But the answer to accumulated ignorance isn’t better code structure; it’s better harnesses, better observability, better verification of behavior. The discipline that this demands is in ensuring you can verify what the system does, or rather what it should be doing, regardless of whether you understand every line. Code structure is one possible input to behavioral verification, but it’s no longer the most cost-effective one, and it’s no longer the bottleneck. Even the skeptics, if you read them carefully, locate the remaining hard work at the judgment, verification, and system-design level. Osmani himself puts it plainly: generation is not the bottleneck anymore. Verification is.

There’s an argument that good code structure remains important because it makes AI agents more effective; well-structured code is easier for agents to navigate, reason about, and modify. This is true, and it’s a perfectly good reason to care about structure. But this value is instrumental, not architectural. Structure that exists to make agents faster is an optimization, and optimizations are cheap to revisit. You can restructure the code to make agents more effective, using agents, and the cost is tokens and a little time. Instrumental value is real, but it doesn’t make a decision hard to reverse, and “hard to reverse” is the only definition of architecture that’s ever been crisp.

Don’t apply old-world rules to new-world economics

Okay, so maybe you can fold on some code decisionmaking. Should you? Fowler’s Design Stamina Hypothesis argued that internal quality drives speed, rather than existing in tension with it, and DORA later backed this up with empirical data. Elite teams aren’t fast despite being disciplined, they’re fast because they’re disciplined. If that’s true, and I think it is, then making refactoring cheap doesn’t make quality optional, because quality was never a trade-off to begin with.

The problem is that teams who aspire to be elite often confuse markers of quality with quality itself, particularly as end users perceive it. Do not mistake ceremony for discipline. Turning quality into a gate instead of a habit leaves you slower, not faster, because you lose the ability to iterate on process, and challenging your process is now a matter of survival. Quality gates are a difficult organizational pathology to deconstruct, because challenging them makes people worry that you’re lowering the bar. This is especially exacerbated at engineering organizations that reward engineers for technical achievement and intellectual derring-do rather than business impact. I’ve written before about how code quality debates often reflect aesthetic preference masquerading as engineering discipline; the dynamic here is the same, just at the organizational level.

I instead try to internalize Kent Beck’s ordering: make it work, make it right, make it fast. I’m generalizing here somewhat to “make it right” meaning “fit to purpose, proportional to your understanding of business need” but I think the generalization is correct, and maybe it always has been. “Right” ultimately means behavioral correctness and business value: software is what it does. Internal structure is at best a predictor of that, not a value in itself. Companies that understood “right” expansively, as a question of market value and user need rather than technical elegance, have shipped rough software and won markets over and over again. Technical excellence is neither necessary nor sufficient for market success; it is, at best, a predictor.⁵ When restructuring was expensive, investing in internal elegance looked like investing in behavioral correctness, because the two were hard to separate. Now that restructuring is cheap, the gap between them is visible.

The discipline that quality demands still matters, but I’d argue that its object has shifted rather dramatically.

Architects should put the business back in business domain

The pattern this essay has been tracing is one of progressive elevation. Database migrations pushed schema work into the general engineering role; CI/CD did the same for deployment; AI is doing it for most remaining code-level decisions. Each time, everyone moved up a level. ICs absorbed what used to be specialist work, and the specialists had to find something more entertaining to do with their time.

That process has now reached the architect. If the remaining architectural decisions are boundaries, contracts, trust, and data, then the skill required to make good decisions about them is no longer primarily technical, it’s strategic: which boundaries create competitive advantage, which contracts enable new markets, and which capabilities need to be independently configurable and sellable? When code-level guidance was closer to the focus of the job, architects could spend most of their attention on it and dabble in strategy. Now much that work is automatable, and the strategic gap is exposed; it’s no longer something you can fill with framework debates.

Architects like to throw around the term “business domain,” but the term often seems more focused on entities and aggregates than an understanding of what commercial problem a component is attempting to solve. Amazon’s original SOA mandate remains the Rosetta Stone here, and it’s often read as a call for clean interfaces, but that reading misses the point. The demand that every internal service be exposable externally was Bezos buying optionality on capabilities Amazon hadn’t yet invented. Some fraction of them would eventually become products, and externalizability ensured they’d be commercially ready before anyone knew which ones; it also forced a reading of “business domain” grounded on commercial and user need. AWS is the most visible, and most important, consequence of that choice. The architectural decision made the business model possible, not the other way round. A decade ago, an architect could nod at business strategy and then go back to ports-and-adapters. There is much less room for that now.

As you explore your domain, ask: does this service reduce operational overhead? Does it open new markets? Does it improve retention within a specific segment? What problem are we solving, and (quite literally) who cares?

What remains hard

The DBA’s calendar got freed up by migrations; the release engineer’s by CI/CD; the architect’s now, by agents. Each time, judgment trapped behind mechanical cost got absorbed into the general role, review boards shifted their focus accordingly, and the specialist’s remit compressed to whatever remained genuinely hard.

But there are still things a single context window, or even aggregated context, struggles to see. Volume amplifies the quality practices a team already has, for better and for worse; the ambient discipline becomes the output, at a much higher multiple. Product debt compounds along the same curve: the gap between what the system does and what users need still depends on humans watching, thinking, and caring about the commercial problem, and that gap widens faster as ship rate increases. And the coordination cost of changing how humans work together hasn’t budged. The apprenticeship model, review practices, incident response, the career ladder; these are structural-process questions, and structural-process change is the most expensive thing any organization can attempt.

What remains hard this time defines the job; how long it stays that way decides what comes next.

My thanks to Erika Newland, Micaël Malta, Pete Hodgson, and Steven Deobald for reviewing drafts of this article.

They were building on Ralph Johnson’s earlier observation, relayed by Martin Fowler, that “architecture is about the important stuff. Whatever that is.” ↩︎
As of this writing, the strongest public quantitative evidence I’ve seen is that the gap is at best 5x. But we’ve all seen much higher for particular use-cases, which suggests that some combination of risk profile, the skill of the practitioner, and the quality of tooling and model make a substantial difference. Since some of these are likely to improve with time, “order of magnitude” seems more than reasonable. ↩︎
Much later, Kelsey Hightower’s visionary no-code movement, and his more recent work around zero-token architecture, took the notion to its logical and correct extreme. ↩︎
Shoutout to the many, many battle-scarred DBAs out there who wish they still did. ↩︎
I’m sick of working at companies that prized technical excellence at the expense of market success. Many of those companies are now dead. Real artists ship. ↩︎