I Grew Up With AI. Then I Found Compound Engineering.

TL;DR

I'm a Gen-Z developer who started coding professionally when ChatGPT was already a year old. AI wasn't something I had to learn, it was just part of the job. But I was using it wrong the whole time, and it took becoming a senior to see it. The mistake: delegating to AI without governing it. The fix: a workflow called Compound Engineering, six phases (Clarify, Plan, Build, Assess, Fix, Capture) that I built as a Claude Code plugin. Three Shopify projects, one week. All shipped. Here's how I got there.

I didn't start coding and then discover AI. I started coding with AI. By the time I wrote my first professional line of code, ChatGPT had been out for a year and I already knew what it could do. It was my default from day one.

Prompt. Get code. Learn from the output. Tweak it. Ship it.

That was the workflow. And honestly? It worked. I grew fast. I was closing tickets faster than anyone expected a junior to. AI and I figured things out together.

I was making a mistake I didn't even know I was making.

The Junior Trap

When you're a junior, the job feels simple: close the ticket. Ship the feature. Move on. AI is perfectly designed for that mindset. You describe a task, it writes code, you commit. Fast. Efficient. Done.

What I wasn't doing: checking if AI had actually understood the requirement. Reviewing what plan it was working from (it didn't have one, it just started coding). Assessing the code structure. Checking quality. Asking whether this fit with how the rest of the codebase was built.

I was delegating without governing. And for small tasks, it didn't matter. Small tasks are where AI is nearly perfect anyway. "Add a checkbox to this settings panel." "Fix the typo in this schema field." Scope is clear, no assumptions needed, AI executes exactly what you asked.

For bigger tasks, it fell apart. Not sometimes. Consistently.

The pattern: we'd tell AI to build a wishlist feature. It would create a plan in its own head, unreviewed, often hallucinated, and start executing. When something broke, we'd say "this isn't working, fix it." Two outcomes from there. Either it fixes the bug and breaks something else, or it doesn't fix it and you're stuck in a loop. Sometimes we caught the breakage the same day. Sometimes QA caught it a week later. Either way, we spent time fixing what AI broke while fixing what we'd asked it to fix.

No self-check. No moment of "wait, is this the right approach?" No governance at all. Just execution and hope.

I didn't recognize this as a problem at the time. It's honestly just how Gen-Z coding culture works. Execution first, foundation later. Delivery first, quality second. AI enables that pattern so well that you don't feel the cost until the codebase starts looking like five developers wrote it without ever talking to each other.

Because that's basically what happened: five separate AI conversations, each starting from zero, each making different structural decisions.

The Question That Changed My Thinking

About two years in, I became a senior on my team. I started working with juniors every day. And watching them do exactly what I used to do, prompt, get code, ship, repeat, I finally saw the pattern from the outside.

They were fast. Their outputs were inconsistent. Their codebases had no coherent structure. Not because they weren't capable, but because they had no system. Neither had I.

I asked myself a simple question: why does AI nail small tasks but consistently struggle with big ones?

The answer was almost obvious once I thought about it. For small tasks, AI already knows what needs to be done. The scope is contained, there are no gaps to fill, it executes exactly what you asked. For big tasks, it fills in gaps. You say "build a wishlist" and AI silently decides the data model, the file structure, the edge case handling, the naming conventions. It picks an approach without telling you. It deviates mid-task when it hits complexity. And at no point does it stop to ask: does my plan make sense? Is my code consistent with the rest of the codebase? Does the output actually match what was asked?

There's no self-check. That's the gap.

The insight: if AI fully understands what needs to be done, without filling in anything on its own, it does dramatically better. The problem wasn't AI's capability. It was that we were skipping the steps that give AI actual clarity and structure, then blaming the output.

What if we didn't skip those steps?

Claude Code Changed Everything

Around this time, the AI tooling market was exploding. New coding tools every few weeks. I tried several. Then I came across Claude Code, and it had something the others didn't: an architecture that let me build governance directly into the tool.

Specifically: Skills (reusable coding context that loads automatically), Commands (structured workflows that follow step by step), and Agents (specialized reviewers that evaluate the output). These weren't just features. They were the building blocks for exactly the system I'd been thinking about.

That's where Compound Engineering came from.

The core idea: instead of one big prompt that AI interprets however it wants, break work into phases. Each phase produces something specific. That output becomes the input for the next phase. Nothing is improvised. Nothing starts from zero. Quality compounds with every completed task.

I built it as a Claude Code plugin called Shopify Theme Standards. Here's how the pipeline works.

The Six Phases

/clarify -- Requirements before everything else.

Before Claude writes a single line of code, it has to fully understand what's being asked. The command breaks your request into three buckets: what's already clear, what it's assuming, and what it still needs to know. It asks questions. You answer them. The output is a Task Spec: goal, requirements, out-of-scope, acceptance criteria. No code suggestions. No architecture decisions. Just requirements, locked down.

This phase exists because most failures happen here, silently. Claude makes an assumption you never validated, builds on top of it for 200 lines, and delivers something that's technically functional but wrong.

/plan -- Structure before execution.

Claude loads every skill in the plugin, explores the actual codebase, and builds a TODO-by-TODO execution plan where every step references which skills apply. If a single skill file is missing, the command stops. Not warns, stops. No planning with incomplete context.

The plan is reviewable before a single line of code is written. You can push back, redirect, or course-correct before the cost is real.

/build -- Execution against the plan, with validation at every step.

Claude executes the plan TODO by TODO, adhering to the skills at every step. After every file it touches, it runs a validation checklist. If a check fails, it fixes the issue before moving forward. It never deviates from the approved plan.

This phase is where consistency comes from. Same skills, every file, every session.

/assess -- Two independent reviewers, in parallel.

Once build is complete, Claude dispatches two reviewers simultaneously. The first asks: did we build what we said we'd build? It goes requirement by requirement, including edge cases, and checks against the Task Spec from Clarify. The second asks: is the code well-written? It checks structure, readability, maintainability, and skill compliance. Each issue gets a severity rating. The verdict: Ready, Needs Work, or Needs Rework.

This is the phase most people skip. It's also the one that catches the most problems before they become QA issues or production bugs.

/fix -- Diagnosis before action.

When something's wrong, Claude doesn't immediately start changing code. It investigates, traces the data flow, performs root cause analysis, and presents a diagnosis. Then it stops. Waits for your approval. Only after you confirm does it apply the fix. And at the end, it reports which skill should have prevented the issue in the first place.

This forced pause matters more than it sounds. Half the time, looking at the diagnosis reveals a different problem than the one you thought you had.

/capture -- Learning that compounds.

After each completed cycle, Claude reviews everything and asks: "If I'd known this before starting, would it have saved time or prevented a mistake?" If yes, it documents the learning. Next time /plan runs on this project, it reads those learnings first.

Over time, every project becomes easier to work on. The skills sharpen. The patterns accumulate. Nothing starts from scratch.

The Skills Layer

Behind the pipeline: eight skills that define exactly how Liquid, CSS, JavaScript, sections, schemas, theme architecture, and Figma-to-code translation should be written for Shopify themes. Plus a Research skill that looks up anything Shopify-related online, documentation, repos, community discussions, so Claude is working from current, sourced information rather than assumptions.

These skills aren't optional context. They're enforced. Every command loads them before acting. If one is missing, the command won't proceed.

That's the point. You can have a perfect workflow and still get inconsistent output if the skills aren't applied every time, without exception. The plugin makes that automatic.

Three Projects, One Week

A few weeks ago, three Shopify projects landed simultaneously. One had production bugs on a live store. One needed new sections built from Figma designs. One was my own project needing go-live enhancements.

Three codebases. Three clients. One week.

What would normally have been an overwhelming context-switch, different codebases, different clients, different stages, just wasn't. I ran the same pipeline across all three. Claude loaded the same skills each time. The decisions that would normally get made inconsistently under deadline pressure were already made, baked into the workflow.

All three shipped by Friday.

That week was the proof. Not because the tasks were easy, but because the structure held even when I was stretched thin.

What I Got Wrong (And Still Get Wrong)

This workflow adds phases. If you're in a rush, Clarify and Plan feel like overhead. The first few times I ran it, I kept wanting to skip straight to Build.

The issues caught in Assess, before QA, before production, consistently represent work that would have taken longer to fix after the fact than Plan took to prevent. But I had to learn that through experience, not faith.

The other honest thing: Capture is only as useful as what you put into it. The learnings compound if you complete the cycle. If you cut the session before running Capture because the feature shipped and you moved on, that learning disappears. The discipline has to be consistent.

The Principle Behind the Plugin

The plugin is specific to Shopify theme development. The principle isn't.

We're already letting AI do the work. That decision is made. The question now is whether we have governance over how it does it. Whether we're the ones deciding the requirements, the plan, the skills. Or whether we're handing that over along with the code and hoping the output is consistent.

Compound Engineering is the answer to that question for me. Clarity before planning. Planning before building. Assessment before shipping. Learning before the next task.

Each phase produces something that feeds the next. Nothing is improvised. Nothing starts from zero. Quality compounds.

Stop prompting. Start engineering with it.