← All case studies

AI Cost Optimization

Local Content Pipeline

A scheduled content pipeline that publishes weekly blog posts to four niche sites — entirely on local hardware, with automated fact-checking and quality gates. Cloud spend dropped from $200/mo to $0 with no measurable quality regression.

Duration

4 months in production

Cloud spend

$0/mo cloud spend (was $200+/mo cloud-only)

Stack

Ollama (llama3.3:70b) · Bash · Cron · Cloudflare Pages · Astro · Custom analysis patterns

The Problem

We run four niche content sites. Each needs a steady cadence of well-researched, SEO-friendly posts to grow organic search. The default playbook — pay GPT-4 ~$0.50 per article via OpenAI's API, or pay a content agency $100+ per post — costs roughly $200/month at our publishing pace, with two unpleasant side effects:

  • Every prompt and every draft passes through a cloud provider with terms of service that change quarterly.
  • Cloud-generated content tends to converge on the same patterns, hurting differentiation.

We wanted local control: our prompts, our fact-checks, our editorial voice — and zero per-article marginal cost.

The Solution

The pipeline is a small set of bash + cron scripts driving a self-hosted Ollama server with llama3.3:70b at 6-bit quantization (~57 GB VRAM across 3× V100 GPUs). Each weekly run does:

  1. Topic selection. A topic queue script proposes candidates from search-trend data, recent news in the niche, and a Google Search Console feed showing what we already rank for. The model picks the highest-leverage topic.
  2. Outline + draft. A two-pass generation: first an outline with H2/H3 sections + key claims; second a full draft tracked against the outline.
  3. Fact-checking gate. A "perishable facts audit" script flags claims that have an expiration date — tax credit percentages, year-specific deadlines, legislation references, pricing. Anything flagged must be verified before publish. (This was added after a real incident where the model cited an out-of-date Solar ITC rate.)
  4. Quality gate. Reusable analysis patterns (extract-wisdom, analyze-claims, review-writing) score the draft. Below threshold → reject + retry. Above → publish.
  5. Deploy. Astro builds the site, Wrangler pushes to Cloudflare Pages, cache gets purged, GSC ping. The post is live within ~3 minutes of generation completing.

The 14 Reusable Analysis Patterns

The pipeline shares a library of 14 reusable analysis patterns — small Markdown prompts that turn the local model into specialized roles: extract_wisdom, analyze_claims, review_code, improve_writing, threat_model, summarize_paper, etc. Each pattern is one file, one role, well-tested. A new content site doesn't require a new model — it requires one new prompt that wires the existing patterns into a fresh editorial voice.

Results

  • Cloud spend: $0 vs. $200+/mo prior. Marginal cost per article: $0 plus electricity.
  • Cadence: 1 high-quality post per site per week (4 posts/week total) sustained for 4 months.
  • Quality: Posts that pass the perishable-fact gate and analysis patterns rank in Google Search Console with median impressions matching pre-pipeline manual posts.
  • Privacy: No drafts, no source notes, no editorial decisions ever leave the network.
  • Reliability: 99% of weekly runs complete without intervention. Failures are logged, retried once, and pushed to a human review queue if the retry also fails.

What Surprised Us

Local 70B is genuinely good enough for long-form content. The gap to GPT-4 isn't zero, but it's also not visible to readers in finished posts. Where the gap matters — fact accuracy on perishable claims — we don't fix it with a better model; we fix it with a verification gate.

The pipeline pays back its build cost in ~3 months at our publishing rate. If you're publishing 4+ articles a week and spending $50+/post on cloud APIs or freelance writing, the math is favorable.

Quality gates are doing more work than the model. The win isn't "70B writes great posts." It's "70B drafts a post, and three reusable patterns check it for clarity, factual risk, and SEO before it ships." The gate is the IP.

SEO discipline beats prompt engineering. An OK post on a topic with real search demand outperforms a brilliant post on a topic nobody searches for. Half the pipeline's effectiveness is the topic-selection script, not the writing.

What We'd Build For You

If you're a marketing team, agency, or solo operator publishing more than ~4 long-form articles a week and paying cloud APIs (or freelancers) per article, a local pipeline pays back fast. We can build:

  • Local LLM inference setup tuned to your hardware (or our hardware on retainer).
  • Custom topic-selection scripts wired to your SEO tooling (GSC, Ahrefs, etc.).
  • Fact-checking gates calibrated to your niche (legal, financial, health all need different perishable-fact patterns).
  • Deploy automation to your existing publishing platform (WordPress, Ghost, Astro, custom CMS).
  • A library of analysis patterns for your editorial voice.

Typical engagement: $5K-$15K for setup + a runbook your team can extend. Optionally retainer for ongoing prompt tuning and pattern additions as your editorial needs grow.

Interested?

See the services page for engagement options, or email [email protected] with your current publishing volume + cost.