opinion essay

The Agentic Engineering Iceberg: Climb by Adding Checkpoints, Not Removing Them

AI coding maturity is an iceberg: overnight autonomy sits above the invisible engineering practices that keep production safe. Unsupervised agents without gates are debt, not maturity. Leaders should invest in durable context, specialist agents, and automated quality gates before chasing the hype.

By LY · Jun 19, 2026

Executive take

Quick answer

The thesis

Most AI coding maturity charts look like an iceberg. The visible tip is what vendors demo in keynotes - agents that run overnight, self-improving loops, teams of agents managing other agents. The mass underwater is what actually keeps your production safe: rules, tests, audit schemas, RLS reviews, rollback plans. The thesis is straightforward: higher maturity levels are not automatically better. Each one only pays off when the levels below it are solid. Unsupervised overnight agents without eval gates are not maturity - they're debt with a hype label. The practical climb is to add checkpoints, not remove them.

Why this matters for this role

What this role should do

Watchouts

The thesis

The thesis is straightforward: higher maturity levels are not automatically better. Each one only pays off when the levels below it are solid. Unsupervised overnight agents without eval gates are not maturity - they're debt with a hype label. The practical climb is to add checkpoints, not remove them.

The iceberg, level by level

Here's where your team likely sits today - and what it actually means when the agent gets something wrong.

Level	What it looks like	What it means for your team
1	Chat copy-paste. Developers paste code from a chatbot. No memory, no guardrails.	Where everyone starts. Every line must be manually reviewed. Low productivity, high error.
2	Inline completions. Editor suggests the next few lines; tab to accept.	Speeds up typing but has no awareness of your codebase or rules. Like predictive text for code.
3	AI-native IDE with repo chat. AI sees the whole codebase and can answer questions.	Useful for lookups and fast onboarding. But no persistent rules: every session starts fresh.
4	Durable context. You set explicit coding standards, security policies, preferred libraries. AI connects to live tools (MCP).	First level that makes agent behavior repeatable and safe. This is the foundation for everything above. If you don't have this, don't go further.
5	Specialist subagents. Different agents do planning, implementation, security review, testing.	Reduces risk by separating responsibilities. No single chat thread can skip a critical check. A small council instead of one voice.
6	Automated quality gates. Hooks and background loops run tests, audits, schema validations before merge. Fail-closed.	The machine enforces quality. You can't merge broken code. Human review only starts after the gates pass. This is the real line between safe and risky.
7	Overnight unsupervised agent teams. Multi-step tasks run across files while no one watches.	High risk without Level-6 gates. This is where vendor keynotes often live - and where production incidents happen if the underwater mass is missing.
8	Agents managing agents. Self-improving loops, teams coordinating other teams.	Aspirational and experimental. Not a mainstream production norm. Often repackaged marketing for lower levels. Watch, but don't bank on it yet.

The pattern is clear: each level's 'wow demo' becomes next year's table stakes. But maturity is measured by what happens when the agent gets it wrong. The mass underwater - tests, schema reviews, rollback plans - keeps you employed when the demo ends.

A skip that cost more than engineering time

A mid-size fintech team skipped from Level 3 (AI-native IDE) to Level 7 (overnight agents) without putting automated gates in place. Within two weeks, a pull request merged that removed an authorization check in a customer-facing API. The change passed manual review because the diff was large and the reviewer trusted the agent's bot comment claiming 'permissions refactored.'

Customer data became visible to all internal users. The fix took three days of incident response and forced a board-level review. The post-mortem found no failing tests - because none were wired to run before merge. The team had removed checkpoints, not added them.

What to do next

Stop chasing overnight autonomy as a default. Invest in the boring mass below the waterline.

First, dogfood your own rules. If you haven't codified durable context (Level 4) - clear project rules, AGENTS.md, tool integrations - start there before rolling out any unsupervised agent. Level 4 is the foundation for everything above.

Next, introduce specialist subagents (Level 5) to separate planning, implementation, security review, and QA.

Then build automated eval gates (Level 6) into your CI: pre-commit hooks, test suites that run before merge, audit scripts that fail the build. Only when those gates are trusted should you experiment with Level 7 on narrow, reversible scopes.

The healthiest organizations are not the ones whose agents run longest unattended. They are the ones whose agents fail closed into tests, audits, and human review. The iceberg is not a scoreboard. Climb by adding checkpoints, not removing them.

Reader feedback

Was this useful?

0 reactions so far

Optional note

Keep going on this topic

Sources

Editorial guidance based on workplace practice patterns. Add external citations before publishing factual claims or policy guidance.

The Agentic Engineering Iceberg: Climb by Adding Checkpoints, Not Removing Them

Quick answer

Business leader

Why this matters for this role

What this role should do

Watchouts

The thesis

The iceberg, level by level

A skip that cost more than engineering time

What to do next

Was this useful?

Help tune future briefings

Keep going on this topic

Sources