opinion essay

The Agentic Engineering Iceberg: Climb by Adding Checkpoints, Not Removing Them

AI coding maturity is an iceberg: overnight autonomy sits above the invisible engineering practices that keep production safe. Unsupervised agents without gates are debt, not maturity. Leaders should invest in durable context, specialist agents, and automated quality gates before chasing the hype.

By LY ·

Executive take

Quick answer

The thesis

Most AI coding maturity charts look like an iceberg. The visible tip is what vendors demo in keynotes - agents that run overnight, self-improving loops, teams of agents managing other agents. The mass underwater is what actually keeps your production safe: rules, tests, audit schemas, RLS reviews, rollback plans. The thesis is straightforward: higher maturity levels are not automatically better. Each one only pays off when the levels below it are solid. Unsupervised overnight agents without eval gates are not maturity - they're debt with a hype label. The practical climb is to add checkpoints, not remove them.

Perspective

Business leader

Primary audience

Why this matters for this role

    What this role should do

      Watchouts

        The thesis

        Most AI coding maturity charts look like an iceberg. The visible tip is what vendors demo in keynotes - agents that run overnight, self-improving loops, teams of agents managing other agents. The mass underwater is what actually keeps your production safe: rules, tests, audit schemas, RLS reviews, rollback plans.

        The thesis is straightforward: higher maturity levels are not automatically better. Each one only pays off when the levels below it are solid. Unsupervised overnight agents without eval gates are not maturity - they're debt with a hype label. The practical climb is to add checkpoints, not remove them.

        The iceberg, level by level

        Here's where your team likely sits today - and what it actually means when the agent gets something wrong.

        LevelWhat it looks likeWhat it means for your team
        1Chat copy-paste. Developers paste code from a chatbot. No memory, no guardrails.Where everyone starts. Every line must be manually reviewed. Low productivity, high error.
        2Inline completions. Editor suggests the next few lines; tab to accept.Speeds up typing but has no awareness of your codebase or rules. Like predictive text for code.
        3AI-native IDE with repo chat. AI sees the whole codebase and can answer questions.Useful for lookups and fast onboarding. But no persistent rules: every session starts fresh.
        4Durable context. You set explicit coding standards, security policies, preferred libraries. AI connects to live tools (MCP).First level that makes agent behavior repeatable and safe. This is the foundation for everything above. If you don't have this, don't go further.
        5Specialist subagents. Different agents do planning, implementation, security review, testing.Reduces risk by separating responsibilities. No single chat thread can skip a critical check. A small council instead of one voice.
        6Automated quality gates. Hooks and background loops run tests, audits, schema validations before merge. Fail-closed.The machine enforces quality. You can't merge broken code. Human review only starts after the gates pass. This is the real line between safe and risky.
        7Overnight unsupervised agent teams. Multi-step tasks run across files while no one watches.High risk without Level-6 gates. This is where vendor keynotes often live - and where production incidents happen if the underwater mass is missing.
        8Agents managing agents. Self-improving loops, teams coordinating other teams.Aspirational and experimental. Not a mainstream production norm. Often repackaged marketing for lower levels. Watch, but don't bank on it yet.

        The pattern is clear: each level's 'wow demo' becomes next year's table stakes. But maturity is measured by what happens when the agent gets it wrong. The mass underwater - tests, schema reviews, rollback plans - keeps you employed when the demo ends.

        A skip that cost more than engineering time

        A mid-size fintech team skipped from Level 3 (AI-native IDE) to Level 7 (overnight agents) without putting automated gates in place. Within two weeks, a pull request merged that removed an authorization check in a customer-facing API. The change passed manual review because the diff was large and the reviewer trusted the agent's bot comment claiming 'permissions refactored.'

        Customer data became visible to all internal users. The fix took three days of incident response and forced a board-level review. The post-mortem found no failing tests - because none were wired to run before merge. The team had removed checkpoints, not added them.

        What to do next

        Stop chasing overnight autonomy as a default. Invest in the boring mass below the waterline.

        First, dogfood your own rules. If you haven't codified durable context (Level 4) - clear project rules, AGENTS.md, tool integrations - start there before rolling out any unsupervised agent. Level 4 is the foundation for everything above.

        Next, introduce specialist subagents (Level 5) to separate planning, implementation, security review, and QA.

        Then build automated eval gates (Level 6) into your CI: pre-commit hooks, test suites that run before merge, audit scripts that fail the build. Only when those gates are trusted should you experiment with Level 7 on narrow, reversible scopes.

        The healthiest organizations are not the ones whose agents run longest unattended. They are the ones whose agents fail closed into tests, audits, and human review. The iceberg is not a scoreboard. Climb by adding checkpoints, not removing them.

        Reader feedback

        Was this useful?

        0 reactions so far

        Sign in to react.

        Reader feedback

        Help tune future briefings

        Tick this off when you have read it, then leave a quick note for future tuning.

        Sign in to save a preferred lens, read state, and feedback.

        Related reading

        Keep going on this topic

        Sources

        Editorial guidance based on workplace practice patterns. Add external citations before publishing factual claims or policy guidance.