Field Notes · Build in public
One founder, seventeen AI teammates, and one quiet phone
How My Storyland is built, shipped, and marketed by a team where every role except one is an AI — written for readers who have never merged a pull request.
How Storyland — a site that turns the books you love into real travel itineraries — is built, shipped, and marketed by a team where every role except one is an AI. Written for readers who have never merged a pull request.
Part 1Three places where work happens
Most companies organize people. Storyland organizes environments. There are exactly three, and everything about the setup follows from what each one can and cannot touch.
- Olga — the founder
- The only human, usually on a phone. She doesn't write the code or run the deploys — she makes the calls that money and users depend on, mostly by dragging cards and clicking one merge button.
- The Cowork AI team — a sealed sandbox
- Two AI departments, each role with a written job description: a product side (product manager, design lead, staff engineer, engineering lead, QA, release engineer, analysts) and, new this week, a marketing side. It can write code, draft content, and update the boards — but it cannot deploy, merge anything risky, or publish without Olga's sign-off.
- The Local Runner — Claude Code on Olga's Mac
- The hands. It has what the sandbox doesn't: real internet, a browser, GitHub credentials, and the key to the production server. It executes — and only inside pre-approved lanes. It never merges or deploys on its own decision.
The sandbox team is deliberately cut off from the real world; the runner deliberately can't decide anything on its own. Only together — with Olga's stamps in between — can a feature reach users.
Why split it this way? Because it makes the scary failure modes structurally impossible instead of merely discouraged. The AI that writes code physically cannot touch the production server. The AI that touches the production server only acts on work a human has explicitly approved. Neither has to be trusted to "remember the rules" — the rules are the walls of the rooms they work in.
Part 2The board is the boss
All product coordination happens on one Linear board (Linear is a project-tracking app — think of a wall of sticky notes in columns). Every piece of work is a card like MYS-161: Redesign the Destinations page, and the column a card sits in is its status. Nobody messages anybody; they read the wall.
A card travels left to right through nine columns. Two of the moves can only be made by Olga — those are her approval gates. Three happen automatically when code events fire. The rest are done by whichever teammate finished the work.
Part 3The four stamps only a human can give
Everything in this system is autonomous except four decisions. They're called gates, numbered like customs checkpoints, and each exists because the step after it is expensive, public, or hard to undo.
| Gate | The question | How Olga grants it |
|---|---|---|
| G1 | Should we build this at all? Ideas are cheap; engineering time isn't. Olga reviews the spec and design mock first — her standing rule is "I want to see design first." | Drags the card Ready → Todo |
| G2 | Is this code good enough to accept? Once merged, a change is woven into everything built after it. The one gate she partly delegates: the AI engineering lead may merge small, clean, routine changes — but anything touching security, money, or user data waits for her. | Clicks Merge on the PR — or lets the delegation rule handle routine ones |
| G4 | May we spend real money on this test? Quality evaluations call paid AI APIs; each run costs actual dollars. | Stamps gate_g4_spend: APPROVED on the request |
| G5 | Does this go live for users, now? Deploys are the one step visitors can feel. | Drags the card Merged → Ready to ship |
Notice what's not gated: writing code, drawing designs, opening PRs, updating the board, verifying the live site, building marketing assets — even merging a small, clean, routine change. The AI team does all of that continuously, without asking. The gates sit exactly at the points of no return — irreversibility, security, money, and anything users can feel — and nowhere else.
Part 4Handoff files: how a sandboxed team gets things done anyway
Here's the puzzle at the heart of the setup. The Cowork team writes the code — but it lives in a sandbox with no real internet, no browser, no production server. So how does its work ever become a pull request, a rendered design, or a live feature?
It writes a letter.
When a sandboxed teammate finishes something that needs real-world hands, it drops a small text file — a handoff — into a shared folder on Olga's Mac. The file says what kind of job it is, which card it belongs to, and exactly what to run. There are six inboxes, one per job type, each with hard limits the runner may never cross:
| Inbox | What it asks for | Hard limit |
|---|---|---|
staff-engineer/ | "The branch is pushed — open the pull request." Or: "run this build the sandbox couldn't." | never merges |
release/ | "This is approved to deploy." Cross-checked against the board — the card must really be in Ready to ship. | G5-gated |
eval/ | "Run this small quality test on the AI's itinerary output and report the scores." | G4-gated, costs capped |
qa/ | "Check the live site actually does X" — real clicks on the real product, written back to the card. | never merges or deploys |
design/ | "Render this HTML mock into a screenshot and attach both to the card so Olga can review it." | render only — never edits |
grow/ | "Build these marketing assets" — fetches photos the sandbox can't reach, composites carousels and Reels. | never posts publicly |
Finished jobs get a written ## RESULT receipt appended and move to a shared done/ folder — an audit trail of every real-world action ever taken.
Part 5Nine minutes past every hour
The Local Runner wakes on a schedule — at :09 past each hour — with no human watching. An unattended AI with production keys sounds alarming until you see how narrow its script is.
First, it checks its own footing. Is GitHub access alive? Is the live site responding? What's actually on the board right now? It re-reads Linear fresh every run rather than trusting local notes, because notes go stale and the board is the boss.
Then it empties the six inboxes, each according to its lane rules — opening PRs, rendering mocks, running approved checks. Then it looks at the deploy queue: every card in Ready to ship is a standing instruction from Olga. For each one it confirms the code is really merged, deploys it, and then — this part matters — proves it worked by loading the live site in a headless browser and checking the page genuinely rendered, because a server can happily say "200 OK" while serving a blank white screen. (That exact failure happened once. It's now a permanent checklist item.)
Then it reports — quietly. The card moves to Shipped with a signed comment, anything that needs Olga goes on one short pending list, and her phone stays silent: the runner is only allowed to notify her for a genuine production emergency. The quiet rule is deliberate — early versions pinged her after every run, a dozen times a day, so the rule flipped: silence is the default, and a notification now means something.
Part 6One feature, end to end
Take a real card — MYS-161, a redesign of the Destinations page Olga requested in chat. The product manager specs it and the design lead attaches a mock; Olga reads both and drags it Ready → Todo (G1). The staff engineer builds it and pushes a branch, dropping a staff-engineer/ handoff; the runner opens the PR, which auto-moves the card to In Review. Olga (or the delegation rule) merges it (G2), QA verifies, and she drags it Merged → Ready to ship (G5). The runner deploys, proves the page rendered, and marks it Shipped. Count the human touches: one sentence of intent and three gestures. Everything else happened without her.
Part 7The marketing department, hired this week
Until this week, all of Storyland's marketing was one AI role that planned, wrote, designed, published, and measured. It worked, then stopped scaling — one context juggling five jobs drops balls. So the team did what human companies do: it reorganized. The one role became a five-role department — a Marketing Director (plans the week and sends Olga one daily digest), a Content Creator & Publisher (drafts, designs in Canva, and hits publish), an Editorial Reviewer (may only comment or bounce a draft — never edit, publish, or approve), a Community Manager (the one role trusted to reply and repost publicly on its own, in Olga's voice, with receipts), and a Marketing Analyst (reads the numbers weekly and files new idea cards).
The crew has its own Linear wall — a Content Pipeline with columns that fit content instead of code — but reuses every structural idea from the engineering side: a board as the only truth, narrow lanes with hard limits, and approval as a single human gesture placed exactly where things become public. Two house rules give it character: every post carries a real photograph (no text-on-a-gradient filler — which loops right back through the grow/ handoff so the runner can fetch the images), and Olga's phone stays quiet, with only a handful of allowed notification types.
Part 8The machinery underneath
For a system with this much process, the physical footprint is almost comically small. The code lives in five GitHub repositories — storyland-web (what you see), storyland-services (accounts, saving, search), storyland-ai (the itinerary brain), storyland-e2e (automated browser tests), and storyland-infrastructure (the deploy recipes) — and production is a single small cloud server. A deploy is the runner copying fresh code to that box over SSH and rebuilding the right container, then loading the live site in a real browser to prove it renders.
Part 9The part nobody plans for: remembering
The unglamorous secret of running an AI team is that the system has to learn from its own incidents, or it repeats them on schedule. Storyland keeps two kinds of institutional memory. First, written job descriptions and runbooks: every role has a charter, every runner lane a routine file, and every rule ("never trust a 200 response — render the page") traces back to a specific day something broke. The rules read like scar tissue, because they are. Second, a curated memory the AI itself maintains — a couple dozen short notes on footguns, preferences, and standing decisions, pruned regularly so stale facts don't masquerade as current ones. An AI team's memory, it turns out, needs gardening exactly like a wiki does.
Part 10Why this shape works
Strip away the specifics and three design choices carry the whole thing. The board is the only truth — a card's column is its state, and every actor re-reads the board before acting, so anyone (human or AI) can be dropped in cold and know exactly where things stand. Approval is a gesture, not a meeting — Olga's entire management overhead is dragging cards and clicking merge, each gesture unambiguous, logged, and placed precisely where irreversibility begins. And capability is separated from authority — the team that can write code can't ship it; the runner that can ship can't decide to. Safety lives in the architecture, not in anyone's good behavior — the only kind of safety that survives 3 a.m. cron runs.
The result: a one-person company where the human does perhaps fifteen minutes of gestures a day, and wakes up to rendered design mocks, opened pull requests, deployed features, and a quiet phone — because silence, here, is engineered to mean "all is well."
Follow the build
New field notes land on the My Storyland Substack first.
Subscribe on Substack →