The Finding Nobody Implemented

_The DORA research said culture predicts engineering performance. Nicole Forsgren's most important finding never made it into a single commercial product._

As a computer scientist, I love data. Things feel good, things feel bad, but our biases shape those feelings, and data is what pulls the signal out. I've believed that since my first software engineering job.

At that first job, I was…

Read more →
From Root CA to User Authorization in nginx+apache. Part 2: Certificate Revocation, CRL and OCSP

A follow-up to Part 1 (EN on LinkedIn · RU on Habr), where we stood up a two-tier PKI: a Root CA and three intermediate CAs — Person, Server and Code. At the end of Part 1 I promised we'd learn to revoke certificates and run OCSP. That's what we'll do here.

Like Part 1, this article is meant as a hands-on manual : for every command and extension we touch, there's an extended reference of…

Read more →
Async LLM inference in CI: stop build workers blocking on slow jobs

TL;DR: Async inference through an AI gateway lets CI build workers submit a long LLM job, get an id back, and poll later, so a 30-second model call stops holding a worker hostage. Here's how I wired it with Bifrost.

Our build workers at Buildkite were each blocked for up to 35 seconds waiting on a single LLM call that summarised failed test output. With a few hundred concurrent builds…

Read more →
Your First LLM API on Kubernetes: From Model to Curl Request

Series links

  • Part 1: Everything You Know About Scaling Web Apps Breaks When You Serve an LLM
  • Part 2: The Request Is the Wrong Unit of Scale for LLMs on Kubernetes
  • Part 3: How Do You Fit a Trillion-Parameter Model Into a Kubernetes Cluster?
  • Part 4: Before the Pod Starts: GPU Node Setup for LLMs on Kubernetes
  • Part 5: OpenAI Already Told Us the Kubernetes…
Read more →
Your AI Code Has 6 Secret Hits. Only 3 Ship in the npm Package.

Secrets in a published npm package are a different set from secrets in your repo. A secret scanner reads the whole git tree; npm pack ships only the files allowlist in package.json. leak_probe.py measures both and prints the gap. On the fixture below it found 6 hits and flagged 3 as actually shipping.

TL;DR

  • A scanner reads your git tree. The packager reads the files allowlist.…
Read more →
How we connected off-page content to actual revenue with a small Calendly to HubSpot webhook

For a long time our demo bookings showed up in the CRM with a source of (direct). Someone read a Reddit answer, clicked through, booked a call, and as far as our data was concerned, fell out of the sky. We knew the off-page content was working. We could not prove which piece.

This is the writeup of the small thing we built to fix that. It is not clever. That is the point. Attribution does not…

Read more →
Building haven bench in the open, and the flaky CI ghost it flushed out

A debugging story from building InferHaven in the open: how a benchmark feature flushed out a flaky-CI race that had nothing to do with it.

I shipped a tokens/sec benchmark for local models. The unit tests were green, and then CI turned red in a way that looked like my fault. It wasn't. Here's the whole hunt: a chown that raced the tide, set -e, and a zsh lock file that vanished…

Read more →
Undo Enables AI Agents to Diagnose Root Cause of Application Issues

Undo today revealed that its platform for recording interactions within applications can now be accessed by artificial intelligence (AI) agents via a Model Context Protocol (MCP) server. Company CEO Greg Law said this Undo AI capability makes it simpler for any agent to discover the root cause of any issue that otherwise would have required […]

Security Education and Awareness: Because Not Everyone Is Technical

In most companies, you won't find a workforce made entirely of developers, engineers, or cybersecurity experts. You'll find salespeople, HR professionals, operations managers, customer support teams, and leadership—alongside technical staff.

This diversity is a strength for business, but it creates a critical challenge for…

Read more →
Linux Backup Made Simple: Automate Incremental System Snapshots with Restic

A single Bash script, a USB drive, and 30 seconds a day. No cloud. No subscriptions. No excuses.

You have spent months tweaking your Linux environment. The perfect i3 config, the custom kernel parameters, the Docker containers you curated one by one. Then one day — a power surge fries your NVMe. A bad rm -rf. A failed dist-upgrade. And it's all gone.

Most Linux users either don't back up…

Read more →
Building ArtifactX: product-ready apt/yum repos in Rust

ArtifactX is a Rust CLI for signed apt/yum repositories: import existing repos, regenerate metadata under your own key, publish atomically, serve static files, and roll back when a cutover goes wrong.

That sounds like a narrow problem. It is. But Linux package repositories are one of those narrow pieces of infrastructure where small mistakes travel far.

There is a funny split in Linux…

Read more →
docker init OCIR OKE: From Empty Folder to Production in 15 Minutes

I timed myself. Starting from an empty directory with a Go application idea, how fast could I get to a running deployment on OKE? The answer was 14 minutes. docker init did more of the work than I expected.

What docker init Does

If you haven't used it, docker init is an interactive scaffolding tool built into Docker CLI. You run it in your project directory and it generates a…

Read more →
Mermaid Diagrams Quickstart and Cheatsheet for Developers

Mermaid is a text-based diagramming tool for people who would rather write diagrams than drag boxes around a canvas. It uses a Markdown-like syntax to describe flowcharts, sequence diagrams, class diagrams, state machines, timelines, Gantt charts, entity relationship diagrams, and more.

For a technical blog, Mermaid is a very good default. The diagrams live next to the article, they can be…

Read more →
Microsoft Brings the Azure SDK for Rust to General Availability

Microsoft has moved the Azure SDK for Rust out of beta and into general availability, giving Rust developers a stable, production-ready way to connect to core Azure services. The release covers Core, Identity, Key Vault (Secrets, Keys, and Certificates), and Storage (Blobs and Queues), built around the same design patterns already used in the .NET, Java, JavaScript, Python, Go, and C++ SDKs.

Salesforce vs Dynamics 365 CE DevOps: A Practical Comparison for Enterprise Teams

Most organizations running CRM platforms eventually face the same challenge: how to deploy changes safely, consistently, and quickly. While Salesforce and Dynamics 365 Customer Engagement (CE) support modern DevOps practices, they approach application lifecycle management differently. Understanding these differences can help teams design more effective deployment pipelines and avoid common…

Newly Appointed CloudBees CEO Charts Agentic AI Engineering Course

The newly appointed CEO of CloudBees, Mo Plassnig, says that as the agentic artificial intelligence (AI) era dawns, the time has come to reinvent software engineering in a way that moves beyond human-centric tooling. Plassnig, who earlier this month succeeded Anuj Kapur, joins CloudBees from Immuta, a provider of a data security and governance platform, […]

Dev Log: 2026-06-22 — Configurable Schedulers, Load-Test Toolkits, and an MCP Server

Some days the work spreads across a few projects instead of landing as one big feature. Today was that — three distinct threads, each with a lesson worth keeping. I'll keep things generic and teach the pattern rather than the project, but the through-line is the same: move things that were hardcoded or ephemeral into something you can configure, repeat, and trust.

Thread 1 — Make scheduled…

Read more →
How to stop an AI agent from burning $47,000 in a loop nobody noticed.

A multi-agent research system sat in production for eleven days doing exactly what it was built to do. Four agents, LangChain-style, coordinating over A2A to pull market data and summarize it. Every health check passed the whole time. No crash, no 500, no timeout... from the outside the system was perfectly healthy.

Two of the four agents had quietly locked into a recursive loop, passing…

Read more →
Capacity Planning Without ML: The 80/20 Approach

There's a small industry of vendors that want to sell you machine learning capacity planning. For 95% of teams, you don't need it. You need a spreadsheet, an honest growth assumption, and a buffer.

Here's the practical version of capacity planning that catches most real problems.

What you actually need to know

You need to answer three questions:

  1. When does the current setup run out?
Read more →
We Keep Our Architecture Rules in the Repo. The AI and the New Hire Read the Same File.

A few weeks ago I watched an agent build a feature beautifully.

Clean code. Tests passed. Did exactly what I asked.

Three sessions later, I opened the same service and didn't recognise it. Nothing was _wrong_ , exactly. Every individual decision was reasonable. Stacked together, they'd quietly walked the codebase somewhere I would never have designed it.

That's when it clicked. The problem…

Read more →
We Renamed the DevOps Team 'Platform Engineering.' Nothing Changed. Here's Why.

And what it actually takes to make the transition real.

Sound Familiar?

A leadership meeting. A slide deck. A reorg announcement.

The DevOps team is now called the Platform Engineering team. The job descriptions get updated. The Confluence pages get renamed. A few engineers get "Platform Engineer" on their LinkedIn profiles.

Six months later, nothing has meaningfully changed.…

Read more →
AI Anomaly Detection in Grafana: 3 Mistakes We Made

_Originally published on kuryzhev.cloud_

We replaced 200 static Prometheus threshold alerts with an AI anomaly detection model — and spent the first month making things measurably worse before we figured out why. The model fired constantly, woke people up at 3am for non-issues, then went completely silent during a real incident. This is the honest account of what went wrong and what the working…

Read more →
The Silent Ledger Leak: Measuring Causality Violations in Async Payment Pipelines

I spent the last few months trying to understand why reconciliation errors keep appearing in high-throughput pipelines. Here is what I found.
In the race to process millions of transactions daily, modern fintech ecosystems have achieved a genuine miracle of scale. But beneath the surface of that velocity lies a structural problem most engineering teams aren't measuring: causality violations in…

Read more →
A Tiered Playwright E2E Strategy: From PR Smoke to Production Validation

A field write-up on a domain/feature-driven Playwright setup — the framework
configuration, the tag strategy that ties tests to a test-management system, and the
tiered run model (smoke on every PR → nightly regression → post-release production
validation). Tooling and infrastructure specifics are generalized so the
patterns are reusable anywhere.

Contents

  1. Context
  2. Framework…
Read more →
Vibe-Coded Infrastructure: How to Ship Fast Without Torching Production

You described what you wanted in plain English, the model wrote the Terraform, you ran apply, and it worked. No docs, no Stack Overflow, no fighting HCL syntax for an hour. That's vibe coding — building by describing intent and riding the model's output — and for infrastructure it is genuinely, addictively fast.

It's also how you accidentally terraform destroy a production VPC at 2 p.m.…

Read more →
GitHub Code Quality Moves to General Availability, Bringing New Costs and Capabilities

GitHub is closing the book on the free preview period for one of its most widely adopted recent features. More than 10,000 enterprises used the GitHub Code Quality public preview to detect maintainability and reliability issues, enforce quality gates, and track code coverage. Starting July 20, 2026, that free ride ends. Code Quality becomes a […]

From Feature Delivery to Platform Engineering.

The Problem: Feature Velocity Was Creating Structural Debt

The system originally started as a simple feature delivery backend:

  • A Django API powering agricultural insights
  • Celery workers handling asynchronous processing
  • Independent endpoints for each new capability
  • A growing set of Earth Observation computations (NDVI, NDWI, etc.)

At first, it worked.

But as more features…

Read more →
Homebrew to Packages: No ID, No Service

Homebrew, the unofficial but default package manager for many Apple Mac users, now has safeguards to prevent supply-chain attacks. The approach mimics how GitHub just fortified npm against attacks by establishing a set of trusted repositories to download from. “The Homebrew team is aware of the supply-side security issues with other package managers. We’ve taken […]

Page 1 Older →