Sobering up on "AI": A Year of Designing With LLMs (Part 3)

A loose cluster of clear glass spheres drifts and settles against a blue-to-white gradient, their edges refracting light into faint chromatic rainbows, an abstract figure for tools whose surface stays transparent while their internal state remains hard to read. — Figure 00 What closed in 2025.

The Hype Curve Has Already Bent

It’s been more than a year since I started this series, and a lot has happened. The air around “AI” was thick with unease then. It has thinned considerably since, in some ways, and in other ways it’s become more opaque than ever. Every product page still has its colorful gradient button (somebody, somewhere, is still trying), but the drumbeat about AI ending the world has gone quiet. Some of that is the technology earning a more sober assessment (pun intended): it failed to deliver the doomsday version of itself. Most of it, I suspect, is the dust simply settling. People are using these systems as tools now, in two broad ways. Most people reach for the first, to write and to ask things, to draft the email and look up the fact. The second, the one this piece is about, is to build with them. Either way, their veneer of “cognition” is getting a subjective-experience refit. Like a shirt the labs convinced us was tailored for us. It never was. They’re seen as tools now. Sometimes useful yet frustrating, sometimes brilliant, frequently wrong. Tools.

In hindsight, I was writing diagnostics from inside the fog, unable to see beyond the boundaries of my knowledge kiddie pool. What I named was a kind of tool whose current state you cannot read by looking at it, and the way your sense of what you are doing shifts when you use it. The tool is no longer a thing you hold in your hand, like my metaphors of rocks and sticks. It is a thing you talk to. In retrospect, all of it was a claim about how our intuitions about the tool force the shape of the work. “The LLM seems to behave like this, so the thinking has to be like that.” A kind of dyadic dance, the fundamentally human creativity of shaping toys out of clay.

The year between those pieces and this one was not a pause. It was watching the conditions, and a sense that I don’t want to sit on the sidelines pontificating about the nature of these tools. I wanted to get to a point where I felt I knew the difference between OpenAI vs Anthropic vs Gemini vs any local [insert-name-here] model.

My training is in HCI. My discipline asks what a person needs and what it costs them when they get it wrong, and it asks before anyone builds the thing. Though over the last decade, with the role diluted and relegated to just the safety industry, we’ve not had the luxury of a seat at the decision-making table. But making product after product for my own personal scaffolding was undeniably the high I’m sure software engineers must feel when they say “I can build that myself, why pay for this tool.” And I mean it: I truly could make the things that felt missing from my professional space. So I’m back to write this piece from that seat, having built tools that scratch that N=1 itch. Genuinely letting me experiment with interaction patterns that no paying client would. That is the seat I am writing from.

For context, I work as the only designer and researcher on a product team. That means no design peers, and a lot of freedom, though the permission to use it has to come from inside. Designers do not usually get to touch code (I don’t know if it’s the boogeyman inside the code that gives designers kerning panic), and through a series of unfortunate events I started supporting frontend. It was a practical reason rather than an ideological one: there were deadlines, the work needed doing, and waiting for a design to be marked up and handed off and built was not the fastest way through.

Being able to carry the UX to the finish with precision directly closed an old gap. The design-to-code handoff has been a problem for as long as designers have been sharing mockups, from Photoshop to Figma dev mode. You see, handing over design screens and prototypes is a complex communication handoff between two fundamentally different problem-solving headspaces. A designer like myself is solving for cognition, for an approximation of the operator’s mental model. The developer is looking at that screen and doing an abstraction that is unique to programming. So somewhere in the process design intent gets rounded off, the edge cases get dropped or new ones show up too late for design to react to or research to loop on, and what ships is a near copy of what was designed. Mind you, I’m not even arguing there are bad actors in the transaction, I’m merely pointing out the nature of HCI vs SWE as siblings that don’t often get along, purely from a problem-domain and job-description mismatch. But oh lordy-lord, when I get to touch frontend, as the designer building, that drift has nowhere to hide. Though I become the victim of the same process too, because now I have nowhere to hide either.

Designing by Building

Doing it myself meant learning the tools. I bought a GitHub Copilot subscription (the first widely-used AI coding assistant), then dropped it for Claude Code (a similar tool from Anthropic) once I wanted to push further. The models underneath (Claude Sonnet, GPT-4.5, and the others) turn out to be rock solid for surgical interface work, as long as I am the one deciding what the operator actually needs. Though some engineers still tell me my job is temporary, and the deeper I go the more I spiral into despair and have to claw my way back from the pit.

Here is the analogy that comes to mind. The “AI” hype was sold to us as clothing replacement, the garment arriving already stitched and the maker no longer needed. But I would put it a third way. What actually showed up was the jump from the needle to the sewing machine. It brought the floor of code creation down to almost nothing, which meant I could finally have tools tailored to me. It sort of unlocks the potential of the personal computer, in a way that makes the last 25 years of OS improvement feel like the climb from bone needles to stainless steel ones. Incredible engineering goes into a stainless steel needle, and the sewing machine is still so much more than that. The battle-hardened programmers will tell you that stitching by hand keeps a cohesive understanding of both the code fabric and its tolerances, while the vibe-coding evangelists are saying the new sewing machine makes you a fashion designer overnight. Except the gates to fashion school are still, mostly, something you have to push open yourself. And everyone’s mileage, passion, and interest will vary, because some people just don’t like fashion. (I’m stuck in this analogy because I don’t get fashion at all, but it worked so well that I am hoisted by my own petard.)

A triptych. On the left, three carved bone sewing needles on black. In the middle, a stainless steel needle threaded with red thread, lit close. On the right, the foot and needle of a sewing machine stitching blue fabric. — Figure 01 Bone needle to steel needle to sewing machine. The same task, three floors of effort apart.

Notice who is not in this argument at all. The labs claim they have built the end of the world. I have spent a year using it to stitch interfaces together. Either their framing is wrong, or the apocalypse has a minimum competence requirement I have not yet met.

A year on, I am still doing it. The hype curve has bent past its peak and is rushing toward the trough of disillusionment. My practice has not gone away, even as we are all, collectively, caught like deer in the headlights of change, seeing the oncoming car with perfect clarity in the instant before it hits, along with the disillusionment regarding the tool that is in fact fundamentally reshaping the way we want to think about work. And the rest of the series walks the consequences.

What surprised me was how gratifying the control is. When you hold the vision and the build both, the thing you pictured is the thing that ships. This is not about a designer who controls everything. It is about what becomes reachable. Once the designer can build the interaction, the business cannot file good UX under “we will figure it out later” anymore. The case for close-to-stellar interaction design stops being something anyone can argue away, and the bar rises, undeniably.

The same control is reaching the engineers. The holdouts are starting to accept the assist, and they are working faster for it, which raises a change-management question I am not going to open here. So I will only name it. Human collaboration has never mattered more than it does in an LLM-driven workflow, and from my vantage it is also the least examined and most quietly buried problem we have, the one my training in industrial-organizational psychology keeps screaming about. That is a later piece in this series.

The Doubling Claim, Audited

There was a claim I made with my chest puffed out, the way you do when you are standing too close to a thing to see its actual size. I said AI would not shrink the design field’s workload, it would double it. Twice the work: one set of design for the human using the product, and a second set for the AI sitting inside that product, doing its level best impression of a button.

I was confident. On a generous audit, I was half right. (The ungenerous audit calls it half wrong, and the ungenerous auditor is usually me.)

Looking back kindly on my past self’s AI-weather forecasting, I was certain the industry’s bet was AI-as-a-feature, stitched into the seams of everything that already existed. Every app sprouts a chat bar. Every page grows a summarize this button nobody remembers asking for. Every feature gets an AI-flavored twin sitting beside it. Designing all of that, the human half and the pretending-to-be-a-button half both, genuinely would have doubled the work. For the weather I was bracing against, my forecast was sound.

The weather simply blew the other way. I spent 2025 watching the AI-button-in-every-app bet grow up. It matured, technically speaking. It works. It ships, in narrow and well-fenced contexts. I can dump a scattering of my git commit messages into Jira and Rovo will shape them into the appropriate ticket. But it never became how most people actually reach for AI on a Tuesday afternoon; even Rovo gets underutilized in my experience. So I spent a year not designing for the future I had spent the year before that learning to design for, which is a peculiar species of regret I cannot recommend. What quietly won instead was the thing I had filed under interesting, secondary: AI as the assistant perched on the engineer’s shoulder, helping write the code, rather than AI as a button buried somewhere in the shipped product.

The winning practice got its name twenty-four days after I published part two of this series. Andrej Karpathy fired off what he would later downgrade to a “shower of thoughts” tweet, and called it vibe coding. Roughly four and a half million people looked at it. By December, Collins Dictionary had made it Word of the Year. The discourse, as ever, sprints while the writing strolls.

So the doubling claim is not dead. It is alive and well anywhere someone is still trying to bolt a button onto every app, and I would defend it there without flinching. But it turns out the unit economics just don’t work in the model where you wrap a product around an LLM, and if someone should try it with an open-source model or a hyper-custom in-house one (hint hint, nudge nudge, cough Apple Intelligence-Siri cough), you get my original doubling problem right back. I’m not currently contending with those, which is why the use cases I see dissolving are all in the universe of software engineering. What remains is an “AI” that can write code a human developer could actually read. That is a humbler sentence than the one I wrote in prior parts. It is also the one that matched the year I got. Though I’m still eagerly waiting for the Apples of the world to come into the sandbox and play.

The Two Halves

The chatbot assistant I talk to is not one thing but two. There is the model, frozen on its training date, describing with total composure a software library the maintainers retired a year ago until I invoke the web-search tool to pull the latest documentation into the chat. And there is the software wrapped around it, reading files and running commands and fetching pages, all of it scrolling past in Claude Code as it happens. The split matters because it divides the work. The software half is verifiable: I can open the file it claims it opened and rerun the command it says it ran. The model half, the judgments and the prose and the guesses about what to do next, is not. There I have to read what it produced, hold it against what I asked for, and decide whether it landed. I would say that’s still easier to do in frontend work, cause wrong is visible.

Still, those coming out of the neverland of “Agentic-workflow-psychosis” talk about very real issues, and over the last year I have collected my own mixed bag of them.

Cognitive debt: letting my thinking go slack because the model will do it for me, or because I have built enough SKILL.md files to scaffold my thinking.
Reviewer fatigue: waving outputs through faster than I can read them.
Rubber-stamping: the performance of review over work that is sailing past unexamined.

Mind you, I am saying I experience this without ever having worked as a developer, just as a prompter, or cosplaying as one. The trap underneath all three is the same: the model pays out on a variable-ratio schedule, the slot-machine reward that sometimes nails the task and sometimes hands back something quietly, expensively broken, and its floor is cheap enough that the looking stops right when the broken thing slips by. None of that is the structural story. It is an operator who never redesigned their attention to match the tool. The structural story is what the rest of this series is for. The problem genuinely requires us to slow down, and there is a deep pit in my stomach, drums of despair beating, that the tech sector of humanity is not going to accept a collective slow-down.

An aerial view of arctic ice floes, white slabs of ice separated by channels of dark teal water, so a single frozen surface reads as broken into distinct pieces. — Figure 02 Not one thing, but two.

None of this landed clean. The same tools I spent a year learning to build with have been cited in layoff announcements since 2023, and I am aware that I am writing this from the side of the ledger that still has a desk.

What the Claude Leak Showed

I had already spent a year quietly working around the thing the leak walked out and confirmed in public.

On 31 March 2026, Anthropic, the company behind Claude, shipped the source code to its own coding tool by accident. The cause was a single missing line in a configuration file. I would give a great deal to know whether the human forgot that line or whether an AI did, and I have no earthly way to find out, so I am left holding nothing but the schadenfreude. For two years the labs have run a very particular story, the your job is going away story, and they have mostly pointed it at software engineers. There is a second barrel aimed at visual designers, the generative-UI track (Claude Design, Google Stitch, and a scatter of smaller hopefuls). The HCI specialist’s job is to stand in the room before anyone writes a line and ask what happens when, eventually, somebody forgets one, the failure we then colloquially call “human error.” The leak happened because somebody forgot one.

For about a day, anyone who installed the tool through the front door got the whole thing, unobfuscated and fully readable, riding along with the download (along with a rapid DMCA takedown request). People grabbed copies. Anthropic confirmed the obvious. The contents are a matter of public record now, which is a polite way of saying the curtain came down mid-trick and stayed down.

And what the curtain showed is precisely what anyone using these tools for serious work had already guessed. The thing you address as your AI assistant is, for the most part, not the AI. It is a very large and very “deterministic” piece of software that phones the AI for a handful of specific decisions and handles everything else itself: reading files, searching the codebase, running commands, choosing the next move, tidying the answer into shape. The AI is summoned only for the parts that genuinely need a guess or a judgment. The rest is done the way software has always done its work, by following rules.

The people building products on top of these models are desperate for predictable behavior, for the plain reason that predictability is the thing you can actually sell. Nobody ships a paying customer a tool that works Monday through Friday and serves up nonsense on Saturday. The leak is something like half a million lines of evidence that the labs have learned this in their bones and are spending their engineering hours acting on it. The whole industry is leaning the same way: shrink the AI’s share of the job, do not grow it. Each new release takes some decision the AI used to be trusted with and hands it to a rule the surrounding software now follows on its own. The race, in the end, is to squeeze the AI’s real responsibility down to a single sentence, which of these specific options should we use right now, because that is the part it is honestly good at. Everything else can be made dependable with plain, unglamorous engineering.

Two large circular looms on a factory floor, rings of shuttles weaving tubes of polypropylene fabric that rise toward the ceiling. — Figure 03 The loom of determinism. Rules in, fabric out.

Which is the quietly sobering reality at the heart of the field. The companies selling the world on the magic of AI are spending the bulk of their engineering hours making certain the AI is asked to perform magic as seldom as humanly possible.

There is a second reason the hours flow this direction, and it is the dull one, the one with a spreadsheet attached. The bill is arriving. The free tokens and the gently unreal prices that venture money covered through 2024 and 2025 are starting to show up as red ink in someone’s ledger. What you pay for serious use keeps climbing. A $200 tier now sits where $20 was the ceiling, and ads have been stitched into the free plan, even as the raw per-token price keeps falling under competition. The labs are being asked, first politely and then less so, whether the unit economics actually close. To justify the next price band they have to ship something dependable, and probabilistic output that falls over every sixth day will not justify a cent of it, while deterministic engineering wrapped snugly around that probabilistic core just might. The playbook is not new. Uber and Netflix wrote it: subsidize the seat until the rider cannot imagine standing back up, then charge what the seat was always going to cost. The seat is being charged for now, and whatever arrives at the new price had better be steadier than what we were handed for free.

You can watch the story bend to fit the ledger in real time. In May 2026, the same month OpenAI was reported to be quietly filing for its IPO, Sam Altman stood on a stage at a bank conference in Sydney and walked back his own “jobs apocalypse,” granting that his team had been “pretty wrong” about the economic damage it once predicted. Dario Amodei softened a nearly identical prophecy the same week. A forecast of doom is a fine thing to raise money on and an awkward thing to carry into a public listing, so the doom gets revised on schedule. The narrative, it turns out, was always downstream of the financing.

That, in the end, is what I read in the leak, and it was never the file count or the technical particulars that made the headlines. It was the half a million lines of patient machinery built up around the model, thickening every quarter, thickening for the least magical reason there is: the people writing it have rent to make, and the people signing their checks have investors to answer to. Yes, do not mistake my sobering for dismissal. This is an amazing piece of technology, worth its place in the collective output of human creative brilliance and knowledge. But I do feel that none of us were left unscathed by the psychological ramifications of “AI-hype.”

What the Year Produced

A year is a long time to carry something heavy. It is also enough time to build something.

This site is one of the things that the year produced. A year ago I was still handing this work to a sitebuilder, eating the design compromises those platforms demand from their customers. That constraint is gone now, and so is the excuse.

The year produced far more than one essay can hold. ”We will explore further” turned out to be a heavier promise than I understood when I made it. The fog of AI hype left us all with an eerie sense of disorientation, collective human intelligence doing what it does best, describing the shape of a thing it cannot yet see (six blind men, one elephant, and at least they agreed on the elephant). It has thinned enough now to work with a steady hand, and the pieces ahead go to work on what the clearing reveals.

There is more to say, and the pieces ahead say it.

Citations

Andrej Karpathy, vibe coding coinage: X tweet, 2 February 2025. Karpathy’s one-year retrospective characterising the original as “a shower of thoughts throwaway tweet that I just fired off”. The term was named Collins English Dictionary Word of the Year for 2025.

The labs’ civilization-scale risk framing: Dario Amodei, The Adolescence of Technology (January 2026), argues the world is “considerably closer to real danger in 2026 than we were in 2023” and lays out civilization-level AI risks. Coverage: Axios, Futurism.

OpenAI and Anthropic walk back the “jobs apocalypse” (May 2026): At a Commonwealth Bank conference in Sydney on 26 May 2026, Sam Altman said an AI “jobs apocalypse” was unlikely and that OpenAI had been “pretty wrong” about the social and economic impact it once predicted: Euronews, TIME. Fortune noted Dario Amodei walked back a near-identical prophecy the same week as both companies eyed IPOs: Fortune.

Claude Code source leak via npm source map (31 March 2026). Anthropic shipped the full Claude Code CLI TypeScript source in npm package version 2.1.88 after a missing .npmignore entry let the Bun build’s source map file ship with the package. Primary reporting and Anthropic confirmation: Bleeping Computer, InfoQ.

Figure 02 (arctic ice): Photograph by Annie Spratt via Unsplash, under the Unsplash License. Resized and converted to WebP.

Figure 03 (circular loom): Photograph by Zhangzj cet, Plastic Woven Bag Production Line Pic 2, via Wikimedia Commons, licensed CC BY-SA 4.0. Resized and converted to WebP.