Essay

The Humility Gap

Why the best engineers often get the worst out of AI

By Niels Kristian Schjødt · June 2026 · ~11 min read
Cover illustration for The Humility Gap — an open hand letting a small bird take flight, beside a clenched fist holding a crushed paper plane, in muted earthy tones

The Thing I Keep Noticing

I keep running into the same quiet pattern, and it has been bothering me for months.

Some of the most capable technical people I know, senior engineers, sharp specialists, people who have shipped things I could only dream of shipping a decade ago, reach for AI, get a so-so result, and quietly conclude: it can’t really do this. I’ll just do it myself. And they’re not wrong about what they saw. The work came back mediocre.

But here is the strange thing: I never feel that any more - I feel the exact opposite. When I hand the same kind of work to AI, I expect it to come back better than if I had sat there grinding through it myself. Not always. But often enough that “I’ll do it myself” has become, for me, the slower and worse option.

Same tools. Same models. Roughly the same problems. Wildly different results. And it clearly isn’t technical talent: the people getting less out of AI are often the most capable engineers in the room. So I started asking the only interesting question here: why? What separates the result that comes back as gold from the one that comes back as gravel?

I have built a theory - not a proof. This essay is that theory.

Two Prompts, Same Goal

Watch two people ask AI for the same thing and you can usually hear the difference in the first sentence.

The grip
Prescriptive · recipe handed over

This doesn’t work. You need to do X, then Y. Implement it on the OrderService class. Add a repository, add a factory, follow the existing pattern. Don’t touch the controller. Do it exactly like this.

The open hand
Goal fixed · route left open

This isn’t doing what I want. I think the problem is somewhere around X, but I’m genuinely not sure. Can you investigate, figure out the cleanest way to get us there, and flag any concerns I should know about before we commit?

Same destination. Completely different posture. The first person arrives with the recipe already written and hands it over to be typed up. The second arrives with a clear target and an open hand about the route.

My claim, and for a long time I had nothing but a hunch, is that the second person gets meaningfully better work out of the machine. Not because their prompt is longer or cleverer. Because it leaves the model room to think.

The tighter you grip, the worse it gets. AI can feel your certainty, and it follows it straight off a cliff.

Why Instruction Backfires

Here is the mechanism, as best I understand it. When you prompt with heavy, precise instruction, the model reads that as a strong signal: the human knows exactly what they want; my job is compliance. So it complies. It puts its head down and executes your specification with admirable discipline, including the parts of your specification that were quietly wrong.

When you prompt with a clear goal and visible uncertainty about the path, the model reads something different: the destination is fixed, the route is open, judgement is welcome. Now it explores. It notices the thing you didn’t. It makes a better local decision than the one you would have dictated, because it actually looked.

For a while this was just my intuition. It is no longer only that. The research is starting to catch up, and it is more pointed than I expected.

The cleanest piece I have found is a 2025 paper with the wonderfully blunt title “You Don’t Need Prompt Engineering Anymore: The Prompting Inversion.” The authors built a rigid, rule-based prompting method, lots of explicit constraints designed to stop the model making dumb mistakes. On a mid-tier model (GPT-4o) it helped: 97% versus 93%. On a more capable model (GPT-5) the same constraints made things worse: 94% versus 96%. Their explanation is the part that made me sit up. The constraints that prevent errors in a weaker model induce what they call “hyper-literalism” in a stronger one. You tie its hands, and a capable model obeys you right past the better answer it could have found.

That is my whole essay in one experiment. The better the model, the more your over-instruction costs you.

It rhymes with other recent findings. Researchers have documented “reasoning rigidity”, where heavily-steered models lock onto a familiar template and stop genuinely engaging with the specifics in front of them. Others have shown that over-prompting, piling on detail and examples, can degrade output rather than improve it; more instruction is not more quality. And a broad survey of goal-oriented prompting points the other way: framing the objective clearly, rather than dictating every step, is what tends to lift performance.

None of this is settled science. But the direction is consistent, recent, and uncomfortable for anyone whose instinct is to micromanage the machine.

📍 The pattern: constraints that protect a weaker model handicap a stronger one. As models get better, the optimal prompt gets less prescriptive, not more. The skill is migrating from “specify the steps” to “frame the goal and get out of the way.”

Why We Hold On

So why do two people prompt so differently in the first place? Why does one grip and the other let go?

I think it comes down to a quiet question that sits underneath the whole thing: have you accepted that, on this task, it might do the job better than you?

If you haven’t accepted that, and if you’ve spent ten or twenty years becoming genuinely excellent at your craft, you have every reason not to, then of course you hold on. You are a proud specialist. You put your name on what you ship. You do not hand work to something you can’t yet vouch for and walk away from it. When you don’t trust the output, you grab the wheel. That instinct is responsibility, not ego, it is the same care that made you good in the first place.

So you do the natural, conscientious thing: you write the spec down to the detail, hand it over, and ask the machine to execute it exactly. Not because you’re a control freak, because you have earned the right to be certain about how this should go, and certainty is most of what your job has rewarded for two decades.

Here is the trap, and it is a subtle one. The problem is almost never that your specification is bad. It is usually excellent. The problem is that the best version of the answer rarely exists yet at the moment you write the spec. You discover it by doing the work, over days, sometimes weeks, through the small realisations, the dead ends, the “oh, it actually needs to work like this” that only shows up once your hands are in it.

When you hand a capable model a specification that’s locked too tightly, you take that journey away from it. It follows your first idea faithfully, all the way to the end. And your first idea, however expert, is rarely as good as the one you’d have arrived at by Friday after living inside the problem. You don’t get a worse engineer than yourself. You get a frozen snapshot of your own thinking from the moment you knew the least.

A tight specification doesn’t only constrain the model. It freezes your first draft into the final answer, and throws away everything you’d have learned by building it.

This is the practical cousin of something I wrote about in It Wasn’t Wrong: the standards seniors spent decades defending were never wrong, the world moved underneath them. The instinct to hold the work close is one of those standards. It served you well. It still does, in plenty of places. It just quietly works against you in the prompt box, of all places.

I believe I often get more out of AI partly because I’ve made my peace with a slightly bruising fact: on most coding tasks, the model can research more of the problem, faster and more completely, than I can in the same window. Once you genuinely believe that, holding the problem loosely stops feeling like a drop in standards and starts being the higher standard, because it is what actually produces the better result.

We Already Learned This. It’s Just Scrum.

Here is where I want to lift this up, because none of it is actually new.

We have spent the last twenty years learning, painfully, how to lead complex work done by capable people. We learned that the waterfall, the giant up-front specification handed down to be executed, produces brittle, joyless, mediocre results. Not because specifications are evil, but because a detailed instruction handed to a capable person makes them switch their brain off. They work with their head tucked under their arm. They stop owning the outcome and start just following the steps.

It’s something we are genuinely proud of in Denmark, rightly or not: the story we tell ourselves is that we don’t run software factories where you hand down a rigid spec and get back exactly the spec. We set a clear direction, trust good people to find the path, and let them think outside the frame on the way there. Set the what and the why with conviction; be generous and open about the how.

That is, almost word for word, how you get the best work out of a strong AI model.

Waterfall posture
“Here is the recipe. Follow it exactly.”
Senior-leadership posture
“Here is the goal. You find the best way there.”

The best practice we evolved for leading senior, experienced, specialist humans turns out to be the best practice for prompting a capable model. Direction over dictation. Intent over instruction. Trust, with verification.

When You Should Be Precise

I need to guard against being misread here, because “just be vague and trust the machine” is a terrible takeaway and the opposite of what I mean.

AI loves clarity on the goal. It thrives on a sharp, unambiguous definition of done. And where you are genuinely 100% certain about the path, the API contract, the naming convention, the non-negotiable constraint, you absolutely should specify it. Precision is not the enemy. Precision in the wrong place is.

The thing to watch is the weapon you wield the specification with. There is a world of difference between “here is the recipe, follow it to the letter” and “here is the target, be exact about that, now go research the route, surface the concerns, and tell me what I’m missing.” Same facts. Different permission.

And when something genuinely must hold, a rule that cannot be violated, the answer usually isn’t a sterner prompt at all. It’s a guardrail outside the prompt: a linter, a validator, a test that makes the wrong thing impossible to ship. That’s the whole argument of Don’t Just Tell It. Enforce It., stop trying to instruct quality into existence and start enforcing it deterministically. Free the model on the path; lock the constraints in code. The two ideas are the same coin: hold the goal hard, hold the route loose, and put your certainty where it actually belongs.

If you want the practical, do-this-not-that version of all this, it lives in Shifting Gears.

Context Is King

There is a second half to gold-versus-gravel. On its own, “just be humble in your prompt” is only half the lesson, and the less important half at that.

Leaving the model room to find a better answer is worthless if you’ve locked it in a dark room. Half the times I’ve watched a capable agent return gravel, the prompt was not the problem at all. The agent was flying blind. It couldn’t see the production logs. It had no access to the error tracker. It never saw the Slack thread from this morning where someone in support pasted three screenshots and a furious customer email that, between them, explained the entire thing.

If the agent can’t reach the circumstances, how is it supposed to uncover the root cause, weigh the real concerns, or surface the option you hadn’t thought of? You didn’t hire a brilliant specialist and then refuse to tell them what was happening. Don’t do it to the model either.

A humble prompt to a blind agent is still a blind agent. Context is king.

So the discipline has two parts, not one. Be humble about the how, and be relentless about the access. Wire up the right MCPs. Give it the production log it can traverse. Connect the error tracking. Hand it the support thread, the screenshots, the emails, the half-finished conversation from this morning. Stop sitting on information the model doesn’t have and then being disappointed that it didn’t guess.

This is the thread that runs through almost everything I write. In Still True the whole argument is that context beats raw intelligence, and that the boring foundations, observability, access, the plumbing, the MCPs, are what decide whether AI is a force multiplier or a frustration. It’s the same lesson as The Feedback Loop: proximity to the real signal changes everything. A model is only as clever as the circumstances you let it see.

The Uncomfortable Mirror

The part I can’t shake is that this is mostly not a story about the model. It’s a story about us.

The agentic tools are already built around exactly the posture I’m describing, you set a high-level goal and the agent plans, explores, and verifies the path. The industry has been moving toward delegated autonomy for a year, and the 2026 agentic-coding reports all tell the same story. The capability is sitting right there. What’s often missing is the willingness to use it, the willingness to be, for a moment, not the smartest one in the conversation.

So when the results disappoint, it’s worth asking a harder question than “is the model good enough yet?” The harder question is: am I holding this so tightly, or keeping it so starved of context, that I never gave it the chance to surprise me?

The best engineers I know are brilliant precisely because they earned the right to be certain. The humility gap is the cost of that certainty, and closing it doesn’t mean knowing less. It means trusting enough to ask a question you don’t already have the answer to, and letting something else do the thinking on the way there.

Loosen your grip. Watch what flies out.