Splitting a Terraform Monolith into Smaller States

If your Terraform plans are slow, your blast radius is too wide, or multiple teams are stepping on each other's changes, it's time to split your monolith. See The Problem with Large Terraform States for how to diagnose whether you've reached that point.

This guide walks through the process of breaking a monolithic Terraform state into smaller, focused states — and how Snap CD can manage the…

Read more →
The Problem with Large Terraform States

At some point every growing Terraform project hits a wall. Plans that used to finish in seconds now take minutes. Applies feel risky because hundreds of resources share a single blast radius. Colleagues avoid running terraform plan because it hammers cloud APIs hard enough to trigger throttling. The state file itself becomes a liability — large, slow to lock, and one bad write away from…

Read more →
The GitHub Actions workflow that's been failing for weeks (and how to find yours)

trpc has a scheduled workflow called "Lock Issues & PRs." Its own scorecard shows it failing on almost every run. It is still scheduled, still running, still red. trpc ships excellent software, which is exactly the point: if a project this careful has a workflow that has been red for ages, the rest of us almost certainly do too.

It is not a one-off. drizzle-orm has one ("Unpublish release").…

Read more →
Why your GitHub Actions CI is slow (and how to speed it up)

Two days ago GitHub emailed me to say one of my workflows had failed. The next day it emailed me again. I saw it, told myself I would fix it tomorrow, and promptly forgot. It was my nightly database backup, quietly broken the whole time, and I only caught it because a failure-rate number nudged up.

A _failed_ run at least gets you an email. A _slow_ run gets you nothing. GitHub never pings you…

Read more →
Best DevSecOps Security Tools for CI/CD Pipeline Protection

I've spent twenty-five years building and securing deployment pipelines, and the single biggest shift in that time isn't a tool — it's _where_ security lives. We used to bolt it on at the end, right before a release, when changing anything was expensive and everyone was already tired. That's backwards. DevSecOps is the correction: you move security checks left, into the pipeline, so problems…

Read more →
The Empire Strikes Back: Mastering Database Backups & Disaster Recovery

The Quest Begins (The "Why")

I still remember the night our staging database went dark. A rogue migration script had wiped a critical table, and the only “backup” we had was a nightly pg_dump that we’d never actually tried to restore. When the alarm blared at 2 a.m., I felt like I was standing in front of a closed gate with no key—frantic, helpless, and wondering if we’d ever see our data…

Read more →
PaperQuire Render Action — PDFs in Your CI Pipeline

Your docs should build themselves

You write your documentation in Markdown. You keep it in a Git repo. Every time someone updates a spec or runbook, someone else has to open PaperQuire (or the CLI), render the PDF, and upload it somewhere.

That manual step is now gone. The PaperQuire Render Action generates branded, print-ready PDFs directly in your GitHub Actions workflow — on every…

Read more →
Docs as Code: Build a CI/CD Pipeline for Your Documentation

Your code has CI/CD. Your docs don't.

Every modern engineering team has automated builds, tests, and deployments for their code. But documentation? That's still someone manually exporting a PDF, uploading it to Confluence, and hoping it's the latest version.

This post shows you how to treat documentation like code: version-controlled Markdown in a Git repo, automatically rendered to branded…

Read more →
DevSecOps Automation: A Deep Dive into SAST

In the era of Artificial Intelligence as a work buddy, it is imperative that security is enforced as development progresses. It could be tempting to treat security as an afterthought, but that will be detrimental to the software development lifecycle. It should be development plus security.
A DevSecOps orchestration system consists of many security policies like static application security…

Read more →
A Tiered Playwright E2E Strategy: From PR Smoke to Production Validation

A field write-up on a domain/feature-driven Playwright setup — the framework
configuration, the tag strategy that ties tests to a test-management system, and the
tiered run model (smoke on every PR → nightly regression → post-release production
validation). Tooling and infrastructure specifics are generalized so the
patterns are reusable anywhere.

Contents

  1. Context
  2. Framework…
Read more →
CI gates for AI-generated PRs need re-derivable evidence

When a CI gate flags an AI-generated PR, the important question is not only "what did it flag?"

It is also:

"Could someone else come back later and re-derive why this finding fired?"

That is the reason I added evidence snapshots to Agent Gate v0.2.1.

What Agent Gate is

Agent Gate is a GitHub Action for AI-generated pull requests.

It does not review code with an LLM. It checks…

Read more →
Add CI When the Project Can Boot, Not When It Feels Finished

I did not add CI to Knot Forget before the Django project existed. There would not have been much point: nothing meaningful to install, lint, or test.

I added it at the first moment where it could prove something useful. The project could boot, dependencies were managed, settings loaded, Ruff had rules to enforce, and pytest could run a smoke test against a minimal home view. There was still no…

Read more →
Page 1