Case study — AI-Augmented Development

Multi-agent
app factory.

Four parallel Claude Code workers, a shared filesystem, lock-based coordination, and automated QA gates. The result: 48 mobile apps shipped in 10 weeks by one person, with the flagship (RxLog) in Play Store production.

Duration: 10 weeks of active build, ongoing maintenance
Output: 48 mobile apps; flagship (RxLog) live in Play Store production
Stack: Claude Code / Bash / Docker / React Native / Expo / Custom orchestration

01 — The problem

Solo-founder arithmetic.

Solo founders face a brutal arithmetic: at any given moment you can build, sell, support, or sleep. Pick three, and the fourth eats you. Hiring solves it but adds payroll, management overhead, and equity dilution before there's any revenue. Outsourcing solves part of it but adds coordination cost and quality risk.

We wanted to test whether a fleet of AI workers — disciplined like junior engineers, supervised like an engineering manager — could compress the build phase from "years" to "weeks" without dropping below shippable quality.

02 — The solution

Four workers, three primitives.

The system is four independent Claude Code processes running inside a single shared Docker container. Each worker reads from and writes to a shared filesystem (/work) but coordinates via three primitives:

01

Project locks

Each worker maintains a status file (WORKER1.md…WORKER4.md). Before touching a project they check the others' files. If another worker has it claimed, they back off and pick something else from the backlog.
02

Shared policies

A POLICIES.md file lists pre-approved actions, boundaries, and standing decisions. Workers act autonomously within the boundaries and ask only at the edges.
03

Failure journal + steering rules

Every failure is logged with root cause and a new rule. A nightly cron extracts patterns and rewrites a steering rules file that all workers read at startup. The fleet learns from any individual's mistakes.

On top of coordination, every app build runs through a single command (/test-app) that gates on 9 automated checks: TypeScript, ESLint, version sync, secret scanning, accessibility, monkey stress testing, rotation, low-memory simulation, and a visual screenshot review against Nielsen heuristics. No app reaches a tester until it's READY.

03 — Cost control

The tiered model discipline.

Cost discipline is enforced at the model layer. A worker that wants to summarize a file uses a free local Ollama call. A worker that wants to read 20 files in parallel uses Haiku (cheap, tool-using). Only complex code synthesis or architectural reasoning goes to Opus. Each worker logs token usage by tier; the statusline displays it; budget overruns trigger a retro.

In a typical month, ~60% of token usage by volume is on the free tier and ~30% on Haiku. Opus is reserved for the work that justifies its cost.

04 — What got built

48 apps, five categories.

48 React Native / Expo apps spanning categories:

Health & medical

RxLog (medication tracker, Play Store production), DailyMark (mood/symptom journal), BodyCalc (clinical calculators).

Productivity & finance

SubWatch (subscription tracker), TallyUp / TipTally / TickTock Timer, CoinCove, TradeJournal, ThoughtWell.

Home & lifestyle

Certivo, AutoKeep, HomeUpkeep, PawJournal, GarageLog, SaleSpotter.

Field work

SiteSnap (PDF inspection reports for contractors), SafeScout (location-based crime mapping), CapitolLens (Congress tracker).

Audio

dB Scout (sound meter), Voice Memo, Focus Timer, Mind Waves (binaural beats), PitchLock (chromatic tuner).

05 — Results

What ten weeks bought.

48 production-quality APKs across diverse categories, each with offline-first architecture, dual themes, and free + Pro variants.

~143K lines of TypeScript shipped, all under 8-gate verification.

RxLog live in Google Play Store production (v1.2.5, 80+ chars subscription description, IAP wired).

Pipeline cost: roughly equivalent to 3-4 contract developers, billed as Anthropic API spend.

Typical app cycle: concept → verified APK in 1-2 days.

06 — What surprised us

Lessons from the fleet.

Coordination is the bottleneck, not capability.

Two workers wedged on the same Gradle build for an hour cost more than the equivalent Opus tokens to do it sequentially. Locks aren't bureaucracy — they're throughput.

Failure journals beat onboarding docs.

Workers actually read steering rules because the rules are short, recent, and concrete ("don't use bash interpolation in heredocs to Ollama, it breaks on quotes"). Nobody reads a 50-page wiki.

The QA gate is what makes the velocity safe.

Without /test-app, "build fast" rapidly becomes "ship broken APKs to your tester." The gate is non-negotiable; ambiguous failures are treated like build failures.

App ideas are cheap; distribution is the wall.

Building 48 apps is fast. Getting 48 apps installed and earning revenue is a completely different problem — one that doesn't get easier just because your build pipeline is fast.

07 — What we'd build for you

A fleet for your backlog.

Most software shops have a queue of "we should build this" projects that never start because the lead engineer is busy. A multi-agent system can take a backlog of clearly-specified internal tools, dashboards, ETL jobs, or mobile MVPs and chew through it in parallel — with QA gates that match your standards, locked into your repo conventions.

For $15K-$50K we can stand up a fleet for your team: shared coordination protocol, custom QA gates fit to your stack, workers configured against your codebase, and a 4-12 week build sprint targeting a specific deliverable backlog. After the engagement ends, you keep the fleet — and the documentation to extend it.