Two seasons ago, I had a bad ski accident. A twisted ankle and knee put me on crutches for a month. For a while, I genuinely wondered whether skiing was over for me.
Two months of physical therapy rebuilt the basics. Last season, I cautiously returned. This season, I assumed I was fully back.
Last week at Sunshine Village, the snow was irresistible. Light, dry powder floating in cold air, calling all the powder chasers higher. I took a blue run off Goat’s Eye Mountain at Sunshine Coast and accidentally ended up on Stampedes.
On paper, both runs are blue. In reality, a section of Stampedes increases from about 20 to 24 degrees. Just a 2 to 4 degree change. That became too much to handle. I found myself standing in the middle of a 24-degree slope, suddenly feeling like I had forgotten how to ski. Even moving cautiously, I fell several times. It was not just embarrassing. It was dangerous. After some kind people helped me make it down, I immediately signed up for an intermediate lesson.
The Four Degrees That Change Everything
The difference between 20 and 24 degrees is not incremental.
It is exponential.
Below 20: You guide the slope. Above 24: The slope guides you.
On steeper terrain:
- Speed compounds.
- Timing becomes precise.
- Weight distribution determines survival.
- Small mistakes scale instantly.
The next day, I met Coach Kenji. He introduced himself as the head trainer who coaches the ski instructors at the resort. He watched me descend once and asked a deceptively simple question:
“How are you carrying your weight at the turning point?”
That was it.
Years ago in beginner lessons, I learned to lean toward the uphill side to slow down. It felt intuitive. Falling uphill felt safer. That approach works under 20 degrees.
But at 24 degrees, it fails. You cannot complete a proper turn.
Kenji taught something counterintuitive.
After initiating the turn with the inside ski, I needed to briefly stand on it as it transitioned into the new outside ski. Fully commit. Then let my outer shoulder guides my body slightly downhill.
Leaning downhill felt counterintuitive.
But here is the physics: the outside ski bites into the snow and carves inward. The edge builds resistance. The centrifugal force generated by the arc stabilizes the body against the edge. That force holds you in place and allows the other ski to pivot cleanly. A small shift of weight at the apex of the turn. Commit forward. Finish the arc deliberately.
One adjustment.Control returned. Speed became manageable. Confidence came back.
This was not something I learned from my beginner lessons. Not something I picked up from YouTube either. Most online advice emphasizes using gravity to slow down and “stay safe.” But on steeper slopes, gravity must be harnessed, not avoided.
As I immerse myself in AI every day, I naturally find myself projecting my experiences and lessons onto it.
AI Makes the Same Kind of 4-Degrees Mistakes
It’s safe to say that AI systems also operate on incomplete training signals. If data is biased, incomplete, or contaminated, the flaws get baked into the model.
They forget and hallucinate, especially when context windows overflow.
Even in recent coding sessions with Anthropic’s Claude Opus 4.6, I have seen obvious errors:
- Planning a meal with avocado and assigning 12g instead of half an avocado.
- Generating 3 meals for 4 servings, then later dividing calories by 6 instead of 12.
- Miscalculating multi-category product costs where totals do not reconcile.
Individually, these look silly and trivial, easily caught.
However, in an agentic pipeline, they compound.
The Avalanche Effect in Agentic Systems
Agentic AI feels like that 24-degree slope.
Fast. Powerful. Amplifying.
A typical agentic coding pipeline includes:
- PRD drafting agent
- PRD decomposition agent
- Planner agent
- Code generator
- Test generator
- Execution loop
- Refinement cycle
If your PRD contains a subtle ambiguity, the engineering spec inherits the gap. The planner encodes it. The code generator multiplies it. The test generator validates it.
The system becomes self-consistent but misaligned. The result looked flawless at the first glance:
- Acceptance criteria met
- Tests passed
- Code volume impressive
But the product did not behave the way you intended.
Everything technically correct.
Everything directionally wrong.
An avalanche of productivity.
Just like on a steep slope, a slight backward lean becomes acceleration, and acceleration becomes panic.
Agents do not merely make mistakes. They amplify them.
AI Iteration Speed and Engineering Challenges
The proportion of AI-generated code is increasing dramatically. In many workflows, AI now writes the majority of new lines. Multi-agent planning, autonomous refactoring, and automated test synthesis are no longer experiments. They are operational.
Iteration cycles that once took days now take minutes. What used to require sprint planning can now be triggered by a prompt. Many engineers no longer touch code directly. They orchestrate AI to implement and fix.
Velocity is no longer the constraint. But does this velocity bring the expected prosperity? From months ago, I know friends who had used AI automatically convert conversation recordings to PRDs and agent swamps to implement, review, test and commit all the codes. I haven’t seen great products coming out from such pioneering endeavors yet. These days amazing agentic tools like Codex, Claude Code, Antigravity make the whole software life cycle more automatable, I use them end to end and experiemented Ralph Wiggum approach (a technique to fully automate coding) too. My product development is vastly accelerated, but the new challenges are emerging.
Quality Control at Scale
AI can produce enormous volumes of code rapidly. It can introduce many risks, include:
- Low-quality implementations that pass tests but fail user intent
- Architectural drift
- Redundant abstractions
- Silent performance regressions
- Stylistic inconsistencies across modules
When humans write code, inconsistency grows slowly. Humans can catch up.
When agents write code, inconsistency compounds geometrically.
We humans barely can catch up reviewing and checking everything. So we rely on LLM reviewers and agentic QA systems. But when everything becomes a black box, how do we know what we are missing?
If we choose to trust AI, we must build strong governance mechanisms. Without them, technical debt accumulates invisibly. Coach Kenji did not ski for me. He corrected the turning point. We humans should be Coach to our AIs:
Not micromanaging every line. Not blocking automation. But knowing where to intervene and where to trust.
Humans must:
- Define architectural boundaries
- Decide where autonomy is safe
- Calibrate intent before execution
- Design evaluation layers
Specification Becomes Infrastructure
In an agentic world, specifications are not documentation. They are executable guardrails.
Last year, I met a startup building around Prompt-Driven Development. At the time, it felt cutting edge. Today, it already feels dated. A handful of prompts, even sophisticated context engineering, is no longer enough. Prompting has become just one small component in a much more industrialized agentic AI development pipeline.
Now the real leverage starts earlier. The first steps in the work flow are PRD and Eng Spec crafting. They are the foundation layer; the beginning of everything. Everything downstream depends on them. If the spec is vague, agents industrialize the vagueness. If things are missing, the agents code with gaps.
To manage development with quality, we need:
- Machine-interpretable specs
- Plan-First Gate
- Clear domain invariants
- Explicit interface contracts
- Structured prompt templates
- Constraint-based generation
In terms of spec format, I prefer breaking PRDs into smaller, structured user stories in JSON format, just as Ralph Wiggum Technique used. Highly structured decomposition forces step-by-step reasoning and reduces ambiguity.
Use the “Plan-First” Gate. Never let an agent code for 5 hours. Require the agent to output a PLAN.md first.
But most importantly, human expertise must be encoded into domain invariants and system constraints. AI can assist. Humans must lead and stay heavily involved.
Decide how much autonomy is appropriate
As a coach to our AI, we should assess Risk Level and Complexity to decide the Autonomy Strategy. Using ski trail analogy:
| Risk Level | Complexity | Examples | Autonomy Strategy |
| Green (Safe) | Low Impact / High Verifiability | Unit tests, CSS styling, Documentation, Boilerplate | Full Autonomy: Let the agent run and review the diff later. |
| Blue (Caution) | Moderate Impact / Verifiability | New feature logic, internal API endpoints, Refactoring | Coached Autonomy: Agent proposes a plan; human approves before execution. |
| Black (Danger) | High Impact / Low Verifiability | Auth logic, Data Modeling, Database schema, Concurrency/Threading | Less/No Autonomy: Human writes the “apex” logic; AI assists with syntax. |
3 Rules for “The Turning Point”
1. The “State” Rule: automate stateless tasks, guide stateful logic.
AI is excellent at Stateless tasks such as ETL, formatting etc. It struggles with Stateful tasks such as, Global state management, cache invalidation, and race conditions.
2. The “Blast Radius” Check: how wide an impact in the code base if it goes wrong
If it’s a localized UI bug: Low blast radius.
If it’s an incorrectly mapped database relationship that corrupts 10,000 records: Extreme blast radius. Rule of Thumb: If the fix requires a database restore, the AI should not have autonomy.
3. The “Invisibility” Factor: the bigger factor, the less automation
Standard logic is easy to review. “Clever” or “Magic” code is not. If the task involves complex abstractions or “under-the-hood” framework magic, the AI will likely hallucinate a solution that looks right but fails in production. This is the “hidden ice” on the slope.
Intent Calibration
When agents operate autonomously, a slight misinterpretation at step one can derail the entire codebase by step ten.
Calibration requires continuous alignment:
- Assign constrained personas such as “You are a minimalist senior developer who writes concise, predictable code.”
- Add granular checkpoints at smaller milestones.
- Avoid letting pipelines run unattended for hours. Long autonomous runs often waste tokens and amplify drift.
- Frequently ask: “Does this code fulfill the original intent without unnecessary complexity? Identify deviations.”
The turning and checking point matters. My experience after letting agents run coding over 5 hours with Ralph Wiggum Technique, is mostly a waste of AI tokens.
Testing Must Evolve Beyond Units Pass/Fail
Traditional software engineering uses a test pyramid: unit, functional, integration, and end-to-end.
AI agents often generate unit tests automatically. But we must explicitly instruct them to build higher-level tests in the test Pyramid. Beyond that, we need to incorporate:
- Behavioral simulation tests
- Intent-alignment evaluation
- Safety and boundary testing
Otherwise, we end up with mountains of green checkmarks resting on unstable foundations.
Carving, Not Surviving
Steep slopes are not the enemy.
AI agents are not the enemy. They are accelerators.
But acceleration without alignment creates chaos.
With proper weight at the turning point, the slope becomes exhilarating instead of terrifying.
With proper human calibration in the loop, AI becomes a precision instrument instead of an avalanche machine.
The difference is not strength. It is the control at the apex.
And in both skiing and AI, that single adjustment changes everything. Let’s charge! 🏔️🤖


