AI Cost Optimization
Local Content Pipeline
A scheduled content pipeline that publishes weekly blog posts to four niche sites — entirely on local hardware, with automated fact-checking and quality gates. Cloud spend dropped from $200/mo to $0 with no measurable quality regression.
Duration
4 months in production
Cloud spend
$0/mo cloud spend (was $200+/mo cloud-only)
Stack
Ollama (llama3.3:70b) · Bash · Cron · Cloudflare Pages · Astro · Custom analysis patterns
The Problem
We run four niche content sites. Each needs a steady cadence of well-researched, SEO-friendly posts to grow organic search. The default playbook — pay GPT-4 ~$0.50 per article via OpenAI's API, or pay a content agency $100+ per post — costs roughly $200/month at our publishing pace, with two unpleasant side effects:
- Every prompt and every draft passes through a cloud provider with terms of service that change quarterly.
- Cloud-generated content tends to converge on the same patterns, hurting differentiation.
We wanted local control: our prompts, our fact-checks, our editorial voice — and zero per-article marginal cost.
The Solution
The pipeline is a small set of bash + cron scripts driving a self-hosted Ollama server with llama3.3:70b at 6-bit quantization (~57 GB VRAM across 3× V100 GPUs). Each weekly run does:
- Topic selection. A topic queue script proposes candidates from search-trend data, recent news in the niche, and a Google Search Console feed showing what we already rank for. The model picks the highest-leverage topic.
- Outline + draft. A two-pass generation: first an outline with H2/H3 sections + key claims; second a full draft tracked against the outline.
- Fact-checking gate. A "perishable facts audit" script flags claims that have an expiration date — tax credit percentages, year-specific deadlines, legislation references, pricing. Anything flagged must be verified before publish. (This was added after a real incident where the model cited an out-of-date Solar ITC rate.)
- Quality gate. Reusable analysis patterns (extract-wisdom, analyze-claims, review-writing) score the draft. Below threshold → reject + retry. Above → publish.
- Deploy. Astro builds the site, Wrangler pushes to Cloudflare Pages, cache gets purged, GSC ping. The post is live within ~3 minutes of generation completing.
The 14 Reusable Analysis Patterns
The pipeline shares a library of 14 reusable analysis patterns — small Markdown prompts that turn the local model into specialized roles: extract_wisdom, analyze_claims, review_code, improve_writing, threat_model, summarize_paper, etc. Each pattern is one file, one role, well-tested. A new content site doesn't require a new model — it requires one new prompt that wires the existing patterns into a fresh editorial voice.
Results
- Cloud spend: $0 vs. $200+/mo prior. Marginal cost per article: $0 plus electricity.
- Cadence: 1 high-quality post per site per week (4 posts/week total) sustained for 4 months.
- Quality: Posts that pass the perishable-fact gate and analysis patterns rank in Google Search Console with median impressions matching pre-pipeline manual posts.
- Privacy: No drafts, no source notes, no editorial decisions ever leave the network.
- Reliability: 99% of weekly runs complete without intervention. Failures are logged, retried once, and pushed to a human review queue if the retry also fails.
What Surprised Us
Local 70B is genuinely good enough for long-form content. The gap to GPT-4 isn't zero, but it's also not visible to readers in finished posts. Where the gap matters — fact accuracy on perishable claims — we don't fix it with a better model; we fix it with a verification gate.
The pipeline pays back its build cost in ~3 months at our publishing rate. If you're publishing 4+ articles a week and spending $50+/post on cloud APIs or freelance writing, the math is favorable.
Quality gates are doing more work than the model. The win isn't "70B writes great posts." It's "70B drafts a post, and three reusable patterns check it for clarity, factual risk, and SEO before it ships." The gate is the IP.
SEO discipline beats prompt engineering. An OK post on a topic with real search demand outperforms a brilliant post on a topic nobody searches for. Half the pipeline's effectiveness is the topic-selection script, not the writing.
What We'd Build For You
If you're a marketing team, agency, or solo operator publishing more than ~4 long-form articles a week and paying cloud APIs (or freelancers) per article, a local pipeline pays back fast. We can build:
- Local LLM inference setup tuned to your hardware (or our hardware on retainer).
- Custom topic-selection scripts wired to your SEO tooling (GSC, Ahrefs, etc.).
- Fact-checking gates calibrated to your niche (legal, financial, health all need different perishable-fact patterns).
- Deploy automation to your existing publishing platform (WordPress, Ghost, Astro, custom CMS).
- A library of analysis patterns for your editorial voice.
Typical engagement: $5K-$15K for setup + a runbook your team can extend. Optionally retainer for ongoing prompt tuning and pattern additions as your editorial needs grow.
Interested?
See the services page for engagement options, or email [email protected] with your current publishing volume + cost.