Corti.com
@sascha.corti.com.ap.brid.gy
31 documents
0 likes
0 shares
Feb 2026 since
View on Bluesky
AI assisted Software Engineering: Scaffolding Your Way from One Agent to a Team

How agent-teams-scaffold turns any repository into a launchpad for Claude Code Agent Teams — and why that's the cleanest jump from Level 6 to Level 7 of AI adoption.

The wall between "an agent" and "a team of agents"

If you've spent time with coding agents, you've probably felt a ceiling. One agent, however capable, is still one context window doing one thing at a time. You hand it a task, it…

Read more →
Serving TokenTelemetry, a localhost-only app on the internet, safely using just a script: a tour of tokentelemetry-caddy

I host OpenClaw and Hermes on two different virtual machines in the cloud. I want to be able to view their token usage using TokenTelemetry, but the tool is built to run unauthenticated on localhost.

To get a way to securely publish the TokenTelemetry endpoint of http://localhost:3000 to the public internet using a custom domain name and automatic forwarding from HTTP to HTTPS, I wrote a Bash…

Read more →
Connecting OpenCode to a Self-Hosted LLM (vLLM + Nemotron 3 Super)

Coding agents like Claude Code and Codex are excellent, but both are wired to a specific vendor's API. If you run your own inference stack — for cost control, data residency, or because you have GPUs sitting idle — you want an agent you can point at your endpoint. OpenCode is the cleanest fit: it's terminal-first, open source, and talks to any OpenAI-compatible API without a translation…

Read more →
Serving Nemotron-Super-120B with a 1M token context on a 2-node DGX Spark cluster

This is a build log. We had two NVIDIA DGX Spark workstations (GB10 / SM121, 128 GB unified memory each), 200 GbE ConnectX-7 NICs, and the goal of serving NVIDIA's Nemotron-3-Super-120B-A12B-NVFP4 with the model's full 1 million token context. The path there crossed several traps that aren't documented in any one place: a missing Ray binary in the latest NGC vLLM image, environment-variable…

Read more →
Clustering Two NVIDIA DGX Sparks to Serve Qwen3-30B-Thinking with Ray + vLLM

TL;DR

We took two NVIDIA DGX Spark units, wired them together over a 200 GbE link, joined them into a single Ray cluster running inside a vLLM container, and serve Qwen/Qwen3-30B-A3B-Thinking-2507-FP8 with tensor parallelism across both boxes. One Spark holds shard 0, the other holds shard 1, Ray dispatches the work, and vLLM exposes an OpenAI-compatible endpoint on port 8000.

The trickiest…

Read more →
Borrowing Memory, Not Speed: Clustering a Mac Studio and a DGX Spark with exo

Every local-inference setup eventually hits the same wall: a model you want to run is a few gigabytes too big for the one machine you'd run it on. You have a 128 GB Mac Studio. The model wants 160 GB. You also happen to have a 128 GB DGX Spark sitting on the same network. The obvious question is whether you can staple the two together and run the thing.

You can. This post is about exactly that…

Read more →
Why Qwen3.6-35B Runs on a NVIDIA DGX Spark and gpt-oss-120B Fought Me Every Step

A field report from getting a local LLM inference endpoint working on an NVIDIA DGX Spark (GB10 / SM121, 128 GB unified memory) — including every wall I hit with gpt-oss-120B, why a smaller FP8 model sidestepped all of them, and how to expose the result safely through an nginx reverse proxy on a multihomed server.

TL;DR: On a GB10 Spark, the quantization format matters more than raw capability.…

Read more →
When the Helpdesk Becomes the Hacker: Technical Analysis of the Meta AI Account Takeover Incident And How to Prevent It

In June 2026, security researchers uncovered one of the most surprising account takeover incidents in recent memory. Attackers did not exploit a memory corruption bug, bypass cryptography, or compromise Meta's infrastructure. Instead, they simply convinced Meta's own AI-powered support system to hand over Instagram accounts. (0xsid.com)

The incident is an important case study for anyone building…

Read more →
Microsoft’s New MAI Models: A Technical Analysis

At Build 2026, Microsoft significantly expanded its in-house MAI (Microsoft AI) model family. While much of the public attention focused on Microsoft's ongoing relationship with OpenAI, the more interesting technical story is that Microsoft is increasingly developing its own foundation models across reasoning, coding, image generation, speech synthesis, and transcription.

The latest…

Read more →
Two Sparks, One Cluster: Why Stacking NVIDIA DGX Spark Units Unlocks Local Frontier-Scale Inference

The NVIDIA DGX Spark put a Grace Blackwell superchip on the desk for the price of a high-end workstation. A single unit is already a capable local-inference box — 128 GB of unified memory, FP4 tensor cores, a full NVIDIA software stack. But the feature that quietly changes the platform's ceiling is the one most people skip past at unboxing: the pair of ConnectX-7 200 GbE QSFP ports on the back.…

Read more →
Perplexity Bumblebee: Fast, Read-Only Supply-Chain Exposure Checks for Developer Machines

Modern software supply-chain incidents move fast. A malicious package version is published, copied into lockfiles, installed into developer environments, embedded into project workspaces, or exposed through editor and browser extensions. The immediate security question is rarely theoretical:

Which developer machines are exposed right now?

That is the problem Perplexity Bumblebee is designed to…

Read more →
Running GPT-OSS-120B on a Single NVIDIA DGX Spark - A Practical Guide

Note on the model name: OpenAI’s open-weight family ships as gpt-oss-20b and gpt-oss-120b. There is no 130B variant — this guide targets gpt-oss-120b, which is the one sized to fit the Spark’s unified memory.

A practical, single-node setup guide for serving gpt-oss-120b as a local coding backend on the GB10 Grace Blackwell DGX Spark, and wiring it into Claude Code.

  1. Why this model fits the…
Read more →
Tiny11: Giving an Old, Unsupported PC a Secure Second Life with a Minimal Windows 11 Installation

When Windows 10 reached end of support, many perfectly usable PCs were pushed into an uncomfortable corner. The hardware still worked. The CPU was still fast enough for web browsing, email, light office work, home automation dashboards, media playback, or workshop use. But the machine could not officially upgrade to Windows 11 because it lacked one or more of Microsoft’s hardware requirements:…

Read more →
Install the “Caveman” Skill for GitHub Copilot CLI System-Wide

Large Language Models are incredibly powerful for software engineering, but they also have a habit of being verbose. Long explanations, conversational filler, and repeated context all consume tokens, increase latency, and dilute the signal-to-noise ratio during AI-assisted engineering.

The “caveman” skill for GitHub Copilot CLI takes the opposite approach: aggressively concise communication…

Read more →
What Achieving AGI Could Mean: Beyond Bigger Models and Longer Context Windows

Artificial General Intelligence, or AGI, is one of those terms that is both overused and underdefined. Depending on who you ask, it means human-level intelligence, economically useful autonomy, recursive self-improvement, scientific superintelligence, or simply “the next thing after today’s chatbots.”

A useful working definition is this:

AGI would be an AI system that can acquire new skills…

Read more →
From Passwords to Keys: Setting Up GitHub SSH Authentication on macOS (and Never Typing Credentials Again)

If you are still cloning GitHub repositories over HTTPS and repeatedly authenticating with browser logins or tokens, switching to SSH is one of those small infrastructure improvements that pays off every day.

SSH authentication gives you:

  • Passwordless Git operations after initial setup
  • Separate identities for personal and work GitHub accounts
  • Better compatibility with terminal-first…
Read more →
LLMs Corrupt Your Documents When You Delegate

The uncomfortable gap between “can edit” and “can be trusted”

A lot of current AI enthusiasm is built around delegation.

We no longer ask language models only to answer questions. We ask them to modify source code, rewrite reports, refactor configuration files, reorganize spreadsheets, update structured records, transform diagrams, edit subtitles, and operate across entire project folders. In…

CopyFail (CVE-2026-31431): Why a Tiny Linux Kernel Bug Became a Massive Infrastructure Threat

A newly disclosed Linux kernel vulnerability dubbed CopyFail (CVE-2026-31431) has quickly become one of the most serious Linux privilege escalation flaws in recent years. The bug allows an unprivileged local user to gain full root access on a vast number of Linux systems released since 2017 — including servers, cloud workloads, Kubernetes nodes, developer workstations, and even some WSL2…

Apple Vision Pro in Switzerland: How to Use It Well in an Unsupported Country

Apple Vision Pro is portable by design, and Apple explicitly positions it as a device you can use at home, at work, and while traveling. But there is a practical difference between traveling with Vision Pro and living in a country where Apple does not officially sell or support it. Switzerland is one of those countries today, so the hardware works, but parts of the software and service experience…

Building an AI-Powered Birthday Calendar with FastAPI and Vanilla JavaScript

A full-stack self-hosted app with email reminders, AI based gift suggestions, and zero framework overhead on the frontend.

Why Build a Birthday Calendar?

I kept forgetting birthdays. Not the big ones, those are hard to miss, but the colleague whose birthday is next Tuesday, or the friend who always remembers mine but whose date I can never recall. I wanted something simple, self-hosted, and…

Working Beyond the Desk: Using the M5 Apple Vision Pro as a High-Brightness External Display that works on the Balcony on a Sunny Day

I recently upgraded from the first-generation Apple Vision Pro to the new Apple Vision Pro M5 because even if this device and MR/VR in general gets a lot of bad press, it has fundamentally changed how I think about “where work happens.”

Most coverage of spatial computing still focuses on immersive apps, entertainment, or futuristic collaboration. What’s underrepresented is a far more…

AI Agent Traps: When the Web Becomes the Attack Surface for Autonomous Agents

Autonomous AI agents are quickly moving beyond chat. They browse the web, read documents, call tools, retrieve knowledge, send messages, and increasingly act on behalf of users and organizations. That shift creates a new security problem: the environment itself can become hostile.

That is the core argument in AI Agent Traps, a 2026 paper by Matija Franklin, Nenad Tomašev, Julian Jacobs, Joel Z.…

Why Running Redis in a Local Docker Container Is a Smart Move for Developers

Modern development is increasingly service-driven. Even small apps often depend on infrastructure components like databases, caches, queues, and session stores. Redis fits naturally into that world because it is fast, simple, and broadly useful for caching, session management, and real-time analytics. Running Redis locally in Docker makes it even more attractive: you get a disposable, isolated,…

HVE Core for VS Code: Turning GitHub Copilot into a Structured Engineering System. A Practical Guide

AI-assisted engineering becomes much more valuable when it is constrained by process, standards, and reusable workflows. That is exactly where HVE Core for VS Code stands out.

Rather than treating GitHub Copilot as a generic chat assistant or code completion engine, Hypervelocity Engineering (HVE) Core turns it into a more structured engineering environment built around specialized agents,…

AI-Powered 3D Printing: From Text to STL with Meshy and OpenClaw

How I taught my AI assistant to generate 3D-printable models from simple text descriptions

The Problem

I've been 3D printing for years, but there's always been a gap in my workflow: organic shapes are hard. Sure, I can design a technical items, holders, brackets or enclosure in Shapr 3D, but when I want something sculptural—a figurine, a decorative piece, or a creative toy for my cats—I'm…

AI Is Not Converging. It Is Being Orchestrated.

For the last two years, the dominant question in AI has been deceptively simple: which model will win?

That question made sense when the market was still trying to understand whether large language models were a novelty, a feature, or a platform shift. It makes less sense now.

After a series of thoughtful conversations on AI strategy, tooling, enterprise adoption, prompting, and product…

From scanners to reasoning: how LLMs and agent harnesses can improve code security

Better models matter, but better harnesses may matter more. The future of AI-assisted security is evidence, validation, and human-guided judgment.

A year ago, a team at Microsoft explored an idea that felt promising but still a little early: using an AI agent to go beyond vulnerability scanning and perform deeper CVE analysis, including generating VEX documents. The goal was not just to detect…

Palantir’s 22-Point Manifesto, Decoded

What The Technological Republic says about software, state power, and the future of defense tech.

Palantir’s recent X post is worth reading carefully, not because it is subtle, but because it is unusually explicit. In 22 compressed points, the company distilled the argument of The Technological Republic into a public statement of ideology: Silicon Valley owes a debt to the nation, software now…

AI Hijacking via Open-Source Agent Tooling: A Five-Layer Attack Anatomy

The threat landscape for AI-assisted development environments has quietly expanded beyond the attack surfaces that traditional security tooling is designed to cover. While conventional supply chain attacks target compiled binaries or runtime dependencies, a new class of attack targets something far more subtle: the behavioral configuration layer of AI coding assistants.

This post performs a…

EvilTokens: An AI-Driven Device Code Attack Compromising Microsoft Businesses

A new class of identity attacks is rapidly scaling across enterprises: AI-augmented device code phishing, operationalized through phishing-as-a-service (PhaaS) platforms like EvilTokens. Microsoft and multiple security vendors have confirmed that these attacks are now widespread and highly effective, compromising organizations daily by abusing legitimate authentication flows rather than…

Page 1 Older →