API keys got you here. They won’t get you where you’re going. OAuth isn’t a future upgrade. It’s the foundation your agents should have been built on from the start.
API keys got you here. They won’t get you where you’re going. OAuth isn’t a future upgrade. It’s the foundation your agents should have been built on from the start.
This article examines how a Carbon-Aware DevOps strategy can help optimize CI/CD pipelines for sustainability by reducing their carbon footprint while also decreasing costs and meeting the increasing demand for innovative, eco-friendly solutions.
Why micro teams and rotation reshape culture, not just throughput, in modern SRE. Most SRE leaders design teams around the systems they own. We designed ours around movement. We introduced micro teams expecting a throughput story: smaller groups, tighter scope, faster work. Some of that arrived. What we had not budgeted for was how much […]
You’ve invested in the tools. Your teams have dashboards that track cycle time, throughput, and work in progress. You’ve likely even built a sophisticated, probabilistic roadmap. Yet, despite the data, it feels like theatre. The moment the workshop ends, that roadmap becomes a static slide deck, and teams remain paralyzed, waiting for you to update […]
If your goals are accelerating flow and maximizing value in your organization, consider grabbing yourself a ticket to this year’s Flowtopia Live on Wednesday, June 24th. Flowtopia is a community of value stream practitioners, and this is our annual online jamboree where we gather to share, learn, and celebrate all things flow-related. Over 12 hours, […]
AI accelerated code delivery, but the bigger change is in who's building software, how much is being built and what it takes to trust it.
As AI speeds up software delivery, the real bottleneck isn’t scanning or CI. It’s how safely and predictably change moves across tools, teams, and companies. Something strange is happening in DevOps right now. AI copilots are writing code, generating tests, triaging incidents, and even summarizing pull requests before a human looks at them. The tooling […]
The software development tools you choose shape your CI/CD pipeline reliability in ways that only surface months later. Learn what to evaluate before adoption.
The emergence of AI has brought endless possibilities and innovative opportunities in today’s ever-changing, fast-paced technology landscape. AI is helping development teams produce software significantly faster than ever before. AI-enabled DevSecOps tools can automatically scan code, infrastructure and other configurations for security issues throughout development, accelerating the overall…
Not every infrastructure pull request deserves the same review path. A tag change in a development account and a network-policy change in production should not create identical reviewer load. When every change is treated as high risk, reviewers stop trusting the signal. In IaC review, I have seen reviewers spend too much attention on low-risk changes […]
In complex software systems, our traditional definition of operational health has always been comfortably binary. For over a decade, site reliability engineering (SRE) teams have relied on the industry-standard ‘Four Golden Signals’ — latency, traffic, errors and saturation — as the ultimate truth of platform stability. If our API-response times are hovering at sub-100 ms, […]
These days, when a developer needs a CI/CD pipeline, they don’t always dive into GitHub Actions docs or spin up Jenkins from scratch. Instead, they pull up an AI assistant and type out something like: “Create a deployment pipeline for a containerized application.” Seconds later, the AI spits out a complete workflow. It looks polished. […]
Agentic observability isn’t about removing engineers from the loop. It is about making the loop faster, better informed, and easier to operate at the scale modern systems require.
Ask any engineering team if they can build their own test automation framework, and the answer is almost always “yes.” With modern AI tools involved, that answer arrives faster and with more confidence than ever before. In 30 days, a capable team can spin up scripts, automate flows, generate test cases, and show a demo […]
For most of the past decade, the conversation around regression testing tools was fairly stable. The tools got faster, the integrations got smoother, and the underlying approach stayed largely the same: write tests, run them in CI, fix failures. The fundamental model did not change much because the problem did not change much. AI-assisted development […]
Modern DevOps practices have completely transformed how we handle compute and orchestration. Tools like Kubernetes enable engineering teams to spin up ephemeral containers in seconds and scale workloads dynamically to meet global demand. Yet the underlying network infrastructure has remained stubbornly rigid. Traditional cloud networking relies heavily on static IP addresses, rigid firewall…
Most enterprise AI projects start with retrieval. You connect Jira, Confluence, SharePoint, and Slack. Maybe a few internal databases nobody has touched in five years. You tune embeddings, optimize chunking, wire up a vector database, and convince yourself you’ve built an AI-powered knowledge system. Then the model server crashes. And suddenly, you discover the uncomfortable […]
If your team can answer the question “Did the system do the right thing?” and not just “Did the system stay up?”, you’re getting close to real observability.
You built the agent. It works in testing. Then it hits production and starts giving wrong answers, timing out or burning through your token budget, and you have no idea why. This is when developers discover that print statements and log files weren’t designed for this. LLM applications fail in ways that traditional tooling can’t see. A hallucination doesn’t throw […]
Agentic SRE is the evolution of site reliability engineering where AI agents help observe systems, reason over telemetry and take bounded operational actions under human-defined guardrails.
AI is making it easier for SaaS companies to build integrations. Give a coding agent decent API docs, some context about the systems involved, and a clear prompt, and it can get surprisingly far. It can write the logic faster than most teams could a year or two ago, saving time, reducing repetitive work, and […]
Record observability spending is driving up MTTR. Discover why tool sprawl and excessive dashboard data cause cognitive overload for on-call engineers, and how to fix it.
The emergence of agent skills — modular, reusable blocks of natural language instructions and metadata — is transforming the developer’s role.
By establishing a robust DevOps foundation now, organizations can leverage these emerging predictive capabilities to transform reactive pipelines into proactive, self-correcting release architectures.
Alert fatigue among Site Reliability Engineering (SRE) teams has reached a breaking point, with responders drowning in thousands of weekly notifications where only 3% genuinely warrant attention. This massive volume of noise—driven by fragmented monitoring tools and rigid, threshold-based alerting—stifles innovation, spikes on-call burnout, and compromises system reliability. Fortunately,…
The era of the flaky test as a simple annoyance is over. As enterprises shift from deterministic applications to agentic AI, flakiness has evolved into a structural bottleneck for traditional CI/CD pipelines reliant on rigid, binary assertions. Because AI agents produce "Y-like" rather than exact results, DevOps architecture must fundamentally change. This article explores the transition from…
There is a silent force shaping engineering culture inside every technology organization. It affects productivity, team morale, psychological safety, and long-term retention. And yet, it is rarely discussed in executive meetings or reflected in meaningful KPIs. That force is on-call. On-call is one of the most direct touchpoints engineers have with the reality of the […]
DORA metrics have been a reliable compass for engineering teams for over a decade. Deployment frequency, lead time for changes, change failure rate, mean time to recovery, and reliability give teams a shared language for talking about delivery performance. The research behind them is solid, the benchmarks are well-established, and most engineering leaders know what […]
Modern distributed hybrid enterprise environments are moving away from siloed monitoring toward AIOps platforms like Selector AI, which combine multi-domain data ingestion, domain-specific network language models, and co-development to enable autonomous, agentic network operations.
The moment you push your code, deployment fires off on its own. The pipeline kicks in, the tests sail through, and within a few minutes your app is live in production. There is no manual sign-off and no one scanning through the final changes. Everything is running on the decisions of an AI agent plugged […]