Case study — AI Cost Optimization

Local content
pipeline.

A scheduled content pipeline that publishes weekly blog posts to four niche sites — entirely on local hardware, with automated fact-checking and quality gates. Cloud spend dropped from $200/mo to $0 with no measurable quality regression.

Duration: 4 months in production
Cloud spend: $0/mo cloud spend (was $200+/mo cloud-only)
Stack: Ollama (llama3.3:70b) / Bash / Cron / Cloudflare Pages / Astro / Custom analysis patterns

01 — The problem

Paying per article.

We run four niche content sites. Each needs a steady cadence of well-researched, SEO-friendly posts to grow organic search. The default playbook — pay GPT-4 ~$0.50 per article via OpenAI's API, or pay a content agency $100+ per post — costs roughly $200/month at our publishing pace, with two unpleasant side effects:

Every prompt and every draft passes through a cloud provider with terms of service that change quarterly.
Cloud-generated content tends to converge on the same patterns, hurting differentiation.

We wanted local control: our prompts, our fact-checks, our editorial voice — and zero per-article marginal cost.

02 — The solution

Cron, bash, and a 70B model.

The pipeline is a small set of bash + cron scripts driving a self-hosted Ollama server with llama3.3:70b at 6-bit quantization (~57 GB VRAM across 3× V100 GPUs). Each weekly run does:

01

Topic selection

A topic queue script proposes candidates from search-trend data, recent news in the niche, and a Google Search Console feed showing what we already rank for. The model picks the highest-leverage topic.
02

Outline + draft

A two-pass generation: first an outline with H2/H3 sections + key claims; second a full draft tracked against the outline.
03

Fact-checking gate

A "perishable facts audit" script flags claims that have an expiration date — tax credit percentages, year-specific deadlines, legislation references, pricing. Anything flagged must be verified before publish. (This was added after a real incident where the model cited an out-of-date Solar ITC rate.)
04

Quality gate

Reusable analysis patterns (extract-wisdom, analyze-claims, review-writing) score the draft. Below threshold → reject + retry. Above → publish.
05

Deploy

Astro builds the site, Wrangler pushes to Cloudflare Pages, cache gets purged, GSC ping. The post is live within ~3 minutes of generation completing.

03 — The patterns

The 14 reusable analysis patterns.

The pipeline shares a library of 14 reusable analysis patterns — small Markdown prompts that turn the local model into specialized roles: extract_wisdom, analyze_claims, review_code, improve_writing, threat_model, summarize_paper, etc. Each pattern is one file, one role, well-tested. A new content site doesn't require a new model — it requires one new prompt that wires the existing patterns into a fresh editorial voice.

04 — Results

What the numbers say.

Cloud spend

$0 vs. $200+/mo prior. Marginal cost per article: $0 plus electricity.

Cadence

1 high-quality post per site per week (4 posts/week total) sustained for 4 months.

Quality

Posts that pass the perishable-fact gate and analysis patterns rank in Google Search Console with median impressions matching pre-pipeline manual posts.

Privacy

No drafts, no source notes, no editorial decisions ever leave the network.

Reliability

99% of weekly runs complete without intervention. Failures are logged, retried once, and pushed to a human review queue if the retry also fails.

05 — What surprised us

Lessons from four months.

Local 70B is genuinely good enough for long-form content.

The gap to GPT-4 isn't zero, but it's also not visible to readers in finished posts. Where the gap matters — fact accuracy on perishable claims — we don't fix it with a better model; we fix it with a verification gate.

The pipeline pays back its build cost in ~3 months at our publishing rate.

If you're publishing 4+ articles a week and spending $50+/post on cloud APIs or freelance writing, the math is favorable.

Quality gates are doing more work than the model.

The win isn't "70B writes great posts." It's "70B drafts a post, and three reusable patterns check it for clarity, factual risk, and SEO before it ships." The gate is the IP.

SEO discipline beats prompt engineering.

An OK post on a topic with real search demand outperforms a brilliant post on a topic nobody searches for. Half the pipeline's effectiveness is the topic-selection script, not the writing.

06 — What we'd build for you

The same pipeline, your voice.

If you're a marketing team, agency, or solo operator publishing more than ~4 long-form articles a week and paying cloud APIs (or freelancers) per article, a local pipeline pays back fast. We can build:

Local LLM inference setup tuned to your hardware (or our hardware on retainer).
Custom topic-selection scripts wired to your SEO tooling (GSC, Ahrefs, etc.).
Fact-checking gates calibrated to your niche (legal, financial, health all need different perishable-fact patterns).
Deploy automation to your existing publishing platform (WordPress, Ghost, Astro, custom CMS).
A library of analysis patterns for your editorial voice.

Typical engagement: $5K-$15K for setup + a runbook your team can extend. Optionally retainer for ongoing prompt tuning and pattern additions as your editorial needs grow.