Build & run a production AI product, solo

I want to tell you exactly how a one-person company ships and operates a real AI product. Not the keynote version — the working version, with the parts that break left in. If you’re a founder or a small team trying to build something real with AI, my hope is that you finish this and think: I could run my work like that. Because you can. The method is the point, and I’ll give it away.

The setup

Here is the honest starting position. I’m one person. I’m not a traditional engineer — I didn’t come up through a computer-science program or a decade of writing production code by hand. I rely on AI agents to write most of the code. And despite all of that, I build and operate a real, production AI product: Journal Genie, a private, source-grounded AI workspace.

Stated plainly like that, it sounds like it shouldn’t work. The common assumption is that “AI writes the code” means “the code is a house of cards.” For a long time, for a lot of people, that assumption is correct. What makes the difference isn’t a smarter model or a secret prompt. It’s an operating system around the model — a set of rules, roles, and habits that turn a fast, confident, occasionally-wrong assistant into something you can actually run a business on.

This essay is that operating system.

Why “vibe coding” wasn’t enough

The first thing I had to unlearn was trusting plausible code.

Modern models are extraordinary at producing code that looks right. It reads cleanly, it uses the right libraries, it compiles, and it often runs. That feeling — watching something work on the first try — is intoxicating, and it’s exactly where solo AI building quietly falls apart. Because plausible code is not the same as trustworthy code, and the distance between the two is where real products live or die.

“It ran” is a very low bar. “It ran on my machine, once, with the inputs I happened to try” is lower still. The bar I need is: I would stake the product on this. I’d ship it to a real person’s private data and sleep fine. Vibe coding — accepting whatever the model produces because it seems fine — never clears that bar. It can’t, because nothing in the loop is actually checking. The model is confident either way. If I’m also just going on vibes, then no one in the entire process has verified anything. That’s not a workflow. That’s a hope.

So the question that organizes everything I do is simple: how do I close the gap between code that looks right and code I can trust? Not occasionally. Every day, on every change, without it depending on my mood or my attention span.

The operating system

The answer is to stop treating the agent as a clever autocomplete and start treating it as a team member who follows a written constitution.

A written constitution. The most important artifact in my company is a document the agents read before they do anything. It states how I work: the standards, the non-negotiables, what “done” means, what’s out of scope, how to handle uncertainty. It’s deliberately short and deliberately strict. The agents follow it. When I want to change how the work happens, I change the constitution — not a one-off instruction I’ll forget I gave. The important decisions live in a source of truth, not in my head and not in a chat history that scrolls away.

Named roles. I don’t have one agent doing everything. The work is split the way a small, disciplined company splits it. An engineering agent owns the code — it writes, refactors, and tests, and it’s accountable for the codebase staying coherent. An operations agent owns everything else: documentation, release discipline, the run log, the boring connective tissue that keeps a company from drifting. Each role has a clear remit, so neither one quietly wanders into work it isn’t responsible for. The separation isn’t bureaucracy. It’s how you keep an agent’s attention bounded, which is most of the battle.

A docs tree that is the source of truth. My memory is not the system of record. A documentation tree is. Decisions, conventions, the current state of the product, the reasons I did things one way and not another — they live in files, version-controlled, where an agent can read them and where I can audit them. This is the difference between a company and a pile of context that only exists while I’m paying attention. If it isn’t written down, it isn’t real, and it certainly isn’t something an agent can act on reliably tomorrow.

None of this is exotic. It’s the operating discipline a good small team already has — just made explicit enough that an agent can follow it.

The evidence habit

If I could keep only one discipline, it would be this one, because it’s the single highest-leverage thing I do: every claim is backed by evidence, and “done” means proven, not plausible.

When the engineering agent says a feature is finished, “finished” is not a vibe and not a summary. It means there is something I can point to: a command that ran and passed, a test that exercises the behavior, a file that demonstrably exists and contains what it should. “I implemented X” is not acceptable on its own. “I implemented X; here is the test that proves X works, and here is its passing output” is.

It looks like this in practice:

Not done:  "Added input validation to the contact form."
Done:      "Added input validation to the contact form.
            Ran the handler test suite — all cases pass, output below.
            Includes the rejects-malformed-email and honeypot-drop cases."

This one habit fixes the trust gap from the second section, because now the checking is built into the definition of the work instead of depending on me catching mistakes after the fact. The agent can be as confident as it likes; the evidence is what I’m reading. Plausible output doesn’t get to call itself done. It has to prove it.

Journal Genie runs on exactly this. It’s a real, production AI product built and operated by one person, with agents — and it works because nothing reaches it that hasn’t cleared the evidence bar first. The method isn’t a talk about how I’d like to build. It’s how the thing in front of you got built.

What actually works

With that structure in place, agents become genuinely, surprisingly reliable — in specific places.

They’re excellent at well-specified, bounded work: take this clearly defined task, here are the conventions, here’s what done looks like, go. They’re excellent at the things that exhaust a human into carelessness — writing the test, updating the documentation to match the code, applying a convention consistently across a whole codebase, doing the third near-identical change as carefully as the first. They’re tireless in exactly the way humans aren’t, and that tirelessness is where quality usually leaks out of small teams.

What the operating system adds is multiplication. A capable agent with no constitution is a fast way to make a mess. The same agent, pointed at a clear source of truth, held to an evidence standard, and kept inside a bounded role, compounds. Each change leaves the system more documented and more tested than it found it, because the rules make that the path of least resistance. The leverage isn’t the model alone. It’s the model inside a system that makes its good behavior the default and its bad behavior expensive.

What breaks

I promised the honest version, so here are the failure modes I actually watch for. They are real, they recur, and naming them is half of handling them.

Silent confidence. The most dangerous failure isn’t the agent being wrong. It’s the agent being wrong with total composure — a clean, assured explanation for something that doesn’t hold up. There’s no tremor in the output to warn you. The guardrail is the evidence habit: I don’t argue with the confidence, I ask for the proof. Confidence is not a signal. A passing test is.

Scope drift. Ask for a small change and you can get a large, “helpful” one — an agent refactoring three things you didn’t ask about, each individually reasonable, collectively a mess. The guardrails are bounded roles and a constitution that says small means small: do the task, not the task plus everything nearby that caught your eye.

Lost context. Agents don’t carry memory between sessions the way a teammate does. Left to chance, the same ground gets relitigated and yesterday’s decision evaporates. This is the whole reason the docs tree is the source of truth and not my recollection: context that matters gets written down, so the next session starts from the decision instead of from zero.

Notice that none of these are fixed by a better model. They’re fixed by the system around it. That’s the reframe that makes solo AI building actually work: you are not trying to find an agent that never errs. You are building an operating system that catches the errors that are coming.

What this means for small teams

Here’s the part I most want you to take away. If the discipline is built in — a written constitution, bounded roles, a real source of truth, and an evidence habit — then a very small team can build and operate real software. Not a toy. Not a demo. Something people can depend on.

The leverage of AI agents is real, but it is conditional. Without the operating system, agents make you faster at producing things you can’t trust, which is not actually faster. With it, a single person can hold the surface area that used to need a team, because the system is doing the remembering, the checking, and the enforcing that a larger team would spread across people.

That’s the quiet thesis under all of this: the operating system is the product behind the product. Journal Genie is what I sell. The way it’s built and run is what makes building it possible at this size. The method travels — it isn’t specific to one product or one founder. It’s just discipline, made explicit enough that agents can carry it.

If you want help

If you’re building with AI and you want this kind of structure around your own work, that’s a large part of what I do. I help founders and teams set up the operating system — the constitution, the roles, the evidence habit, the guardrails — so their agents produce work they can actually stand behind.

You can see how I work with people on the services page. And if this resonated, the best compliment is to go run your own work like it.