Still True — Niels Kristian Schjødt

The Contrast

Most of what I've been writing about lately is change. Letting go of assumptions that served us well for twenty years. Setting the old rules on fire. Shifting gears entirely on how we think about engineering.

But there's another story I haven't told yet, and it's just as important: the things that haven't changed. The practices that were always considered best practice, and that are now more true, more urgent, and more valuable than they've ever been.

This is about the boring stuff. Infrastructure. Monitoring. Logs. The things that never made anyone's keynote reel but have always separated teams that ship with confidence from teams that ship and pray.

The Speed Problem

AI lets us ship at extraordinary speed. That's not hype, it's what I see every day at AutoUncle. Features that used to take a sprint now take a day. The codebase moves faster than it ever has.

But speed without foundations is just chaos moving faster.

In Don't Just Tell It. Enforce It., I wrote about the guardrails you need around the codebase: linters, validators, automated tests that catch the AI when it drifts. That covers the path from code to deployment. This post is about what happens after you deploy. The production side. Because if the codebase guardrails are the seatbelt, production monitoring is the road itself. You can't go fast if the road is dark.

Infrastructure as Code

Terraform. Ansible. Docker Compose. CloudFormation. Pulumi. These tools have been around for years, and the pitch has always been the same: define your infrastructure in version-controlled, repeatable, deterministic files instead of clicking around in a web console.

That pitch is now twice as compelling, because AI is remarkably good at writing infrastructure code. Give it your requirements and it'll produce a Terraform config, an Ansible playbook, or a Docker setup that's solid on the first pass. The speed gain is real. But, and this is the part people miss, that speed only works if your infrastructure is code. If your production setup lives in someone's head, or in a series of manual clicks in the AWS Console, AI can't help you. It has nothing to read, nothing to modify, nothing to reason about.

Infrastructure as code was always the right call. Now it's the prerequisite for AI-assisted operations.

Monitoring Is the New Bottleneck

Here's an irony I've noticed. AI-written code tends to ship with fewer obvious bugs per deployment, because the agent writes decent tests and one-shots well. So teams get comfortable. They feel confident. The deployments go smoothly, until the one that doesn't. And when that happens, the question isn't "is the AI good enough?" It's "can we see what's happening in production?"

At AutoUncle, we run New Relic for application performance monitoring, Honeybadger and Sentry for error tracking, and the ELK stack, Elasticsearch, Logstash, Kibana, for log aggregation and search. We've invested in request ID tracing so we can follow a single user action through multiple services: the web server, the background jobs, the search engine, the external APIs.

None of that is new. We've had most of it for years. What's new is how much it matters.

When you're shipping at AI speed, the window between "deployed" and "in front of users" shrinks. You need to know faster. You need the stack trace, the error context, the request path, the deployment diff, and you need it within minutes, not hours. Monitoring that was "nice to have" becomes "can't ship without it."

The faster you ship, the faster you need to see what's happening. Monitoring scales with velocity.

Make It Accessible to AI

Here's where it gets powerful.

Most monitoring tools now support MCPs, Model Context Protocol servers. These are lightweight interfaces that let AI agents query your monitoring data directly. At AutoUncle, we have MCPs for Elasticsearch, New Relic, Honeybadger, and Sentry. They sit alongside the MCP connections to our codebase, our project management tools, and our documentation.

What does that mean in practice? When a bug appears in production, I don't open five browser tabs. I describe the problem to the agent in my IDE. It pulls the error from Sentry, searches the logs in Elasticsearch for the relevant request IDs, checks New Relic for performance anomalies around the same timestamp, and looks at the recent deployment history, all while having the actual codebase open and editable in the same session.

The speed at which AI can investigate and fix a production bug, when the monitoring is in place and accessible, is unlike anything I've seen in twenty years of engineering. What used to be a morning of context-switching across tools becomes a focused five-minute conversation with an agent that has all the context at once.

Before MCPs

Open Sentry, search logs in Kibana, check New Relic, read the code, correlate manually

→

With MCPs

One agent, all sources, investigate and fix in the same session

The Seat Tax

A practical note. Many monitoring tools charge per seat. New Relic, Datadog, Sentry's team plans, they all scale with the number of people who need access. When your developers interact with production data through MCPs in their IDE instead of individual dashboards, the math changes. You don't necessarily need 15 seats across three monitoring platforms if the AI agent can query and retrieve data for anyone who asks. It doesn't eliminate the need for the tools. It reduces how many humans need direct access to the UI.

I'm not saying you should cancel your monitoring subscriptions. I'm saying the per-seat model is going to come under pressure, and MCPs are why.

What's Next: AI on Watch

The step we haven't fully taken yet, but can see clearly, is AI proactively monitoring production. Not just responding when you ask it to investigate. Watching. Noticing patterns. Correlating a slow database query with a spike in error rates. Flagging an anomaly in request latency before it becomes an incident.

The pieces are all there. The MCPs exist. The monitoring data exists. The models are capable of pattern recognition across structured data. What's missing is the orchestration, the always-on agent that watches your production environment the way a senior SRE would, except it never sleeps and it can read every log line.

An honest question: If you've started experimenting with AI-driven production monitoring, agents that proactively watch for anomalies rather than just respond to alerts, I'd genuinely love to hear about it. This feels like the next frontier, and I suspect a lot of teams are exploring it quietly.

The Real Bottleneck

I'll end with a belief I've come to hold strongly over the past year.

When AI fails to solve a problem, it's almost never because the model isn't smart enough. It's because we didn't give it the context. The stack trace was missing. The deployment history was in someone's head. The logs weren't searchable. The infrastructure wasn't documented. The monitoring wasn't connected.

The bottleneck isn't intelligence. It's context availability.

Don't wait for a smarter model. Build a better-informed one.

Every hour you spend making your production environment observable, your infrastructure codified, and your monitoring accessible to AI is the highest-leverage investment you can make. Not because these things are new. Because they were always true, and now the payoff is compounding.