Git Worktrees + Agentic AI - The Day I Stopped Single-Threading My Brain

Git Worktrees + Agentic AI - The Day I Stopped Single-Threading My Brain

I didn’t start using git worktrees because I needed better Git hygiene.

I started using them because I was tired of waiting.

At the Agentic AI Symposium, someone mentioned running parallel agents using worktrees. Not in a dramatic way. Not framed as the future. Just casually, like everyone already knew.

And it bugged me.

Because I’d been running Claude Code daily inside Vinyl Crate - a mid-sized SwiftUI repo with @Observable, Core Data, async rendering, localization, real architecture - and I was still operating like it was 2016.

One task. One branch. One terminal.

AI was fast. My workflow wasn’t.

That mismatch stuck in my head longer than it should have.


The Real Bottleneck Wasn’t AI

That week my sprint board looked like this:

  • Refactor older ObservableObject models to @Observable
  • Clean up localization inconsistencies in several views
  • Review a Claude-generated refactor PR
  • Tighten .task cancellation behavior in a few async-heavy screens

Normally I would pick one, spin up Claude, review the output, merge, then move on.


What Git Worktrees Actually Are (Under the Hood)

If you haven’t touched them before, here’s the part most docs don’t emphasize.

git worktree add creates additional working directories that share the same repository object database.

Each worktree maintains its own working directory, its own index file, and its own HEAD, which means changes remain isolated at the filesystem and staging level. What they do not duplicate is the underlying repository memory - they all point to the same .git object store, the same commit graph, and the same refs namespace.

Internally, Git tracks each worktree inside:

.git/worktrees/<name>/

You’ll find a separate HEAD, index, and metadata there. What you won’t find is duplicated history.

This isn’t cloning.

Cloning creates a completely separate copy of the repository - its own object store, its own history snapshot, its own evolving state. That’s useful when you need hard separation, but it also means duplication, divergence, and eventual reconciliation.

Worktrees do something more surgical. They isolate the working directory, the index, and HEAD, but they anchor everything back to the same underlying object database. You get separation at the execution layer without fracturing the repository’s memory.

That distinction becomes critical the moment multiple agents start touching the same codebase. When three parallel reasoning streams are reading and writing in adjacent branches, you don’t want three drifting copies of history. You want shared ground truth with isolated execution.

Parallel systems collapse when isolation is sloppy. Worktrees give you isolation without fragmentation - and that’s exactly what parallel agents require.


The First Experiment (Three Worktrees, No Overthinking)

From main I ran:

git worktree add ../vc-refactor feature/refactor-observable
git worktree add ../vc-i18n chore/i18n-audit
git worktree add ../vc-review review/refactor-branch

Directory layout:

vinyl-crate/
vc-refactor/
vc-i18n/
vc-review/

All three started working at the same time - not queued, not politely waiting their turn, but actively moving the codebase forward in parallel. The refactor agent was rewriting state models, the reviewer was dissecting diffs, and the localization audit was crawling through string usage like a static analyzer with opinions.

It wasn’t sequential anymore. It was concurrent.

And that’s where the role shift hit.

I wasn’t hunched over the keyboard trying to out-type a machine. I wasn’t racing the AI or supervising every keystroke. I was watching three streams of progress unfold, stepping in when boundaries blurred, tightening scope when drift appeared, and adjusting direction like a producer balancing levels in a studio session.

I wasn’t typing. I was steering.

That’s when things got interesting.


Running Three Agents in Parallel

Inside vc-refactor, I scoped the task tightly:

Refactor ObservableObject models in Features/Collection/ to @Observable. Do not modify Core Data models. Preserve public APIs. Stop if more than 10 files change.

Inside vc-review:

Review feature/refactor-observable for SwiftUI rendering inefficiencies, state misuse, and incorrect main-thread assumptions. Provide inline diff commentary only.

Inside vc-i18n:

Scan SwiftUI views for hardcoded strings. Validate against Localizable.strings. Produce a structured report. Do not rewrite unrelated code.

That level of decomposition isn’t junior muscle. It’s the difference between shipping features and designing systems. When you can carve work into isolated, non-overlapping lanes - where refactoring state doesn’t collide with rendering changes, and localization audits don’t accidentally mutate business logic - you’re no longer just implementing tasks. You’re shaping the execution surface.

That’s architectural thinking. Not diagrams on a whiteboard, but deliberate boundary setting at the file, module, and responsibility level.

And that’s where architecture becomes the moat.

Anyone can ask an AI to “improve this file.” That’s table stakes now. Not everyone can design parallel, non-colliding streams of work that allow multiple agents - or multiple engineers - to move simultaneously without stepping on each other. That skill compounds. That skill scales. And that’s the real leverage.


The Part That Actually Matters: Decomposition

Worktrees aren’t the magic.

The magic is boundary clarity - and that’s much harder than it sounds.

Parallel agents only work if task boundaries are surgical, not aspirational. When work is loosely defined, agents don’t just move faster; they move wider. If two agents touch the same files, merge friction increases. If scope is vague, drift happens. If prompts are lazy, chaos follows - not because the model is broken, but because it’s doing exactly what you implicitly allowed.

Running parallel sessions forced me to confront that reality. I couldn’t hide behind broad tickets anymore. I had to decide, precisely, what belonged inside a lane and what did not. Where a refactor ended. Where a review began. Where an audit had no permission to wander.

That discipline changed how I slice work.

Instead of:

Refactor the feature.

It became:

Refactor state model only. Audit rendering only. Validate string resources only.

That level of decomposition isn’t junior muscle. It’s the difference between shipping features and designing systems. When you can carve work into isolated, non-overlapping lanes - where refactoring state doesn’t collide with rendering changes, and localization audits don’t accidentally mutate business logic - you’re no longer just implementing tasks. You’re shaping the execution surface.

That’s architectural thinking. Not diagrams on a whiteboard, but deliberate boundary setting at the file, module, and responsibility level.

And that’s where architecture becomes the moat.

Anyone can ask an AI to “improve this file.” That’s table stakes now. Not everyone can design parallel, non-colliding streams of work that allow multiple agents - or multiple engineers - to move simultaneously without stepping on each other. That skill compounds. That skill scales. And that’s the real leverage.


Where It Broke (Because It Will)

The refactor agent got ambitious - not malicious, not broken, just logically aggressive in the way machines tend to be when they see a broader pattern.

It didn’t just migrate ObservableObject to @Observable. It started normalizing shared state conventions across adjacent modules, cleaning up bindings it considered redundant, and tightening patterns that technically worked but weren’t stylistically consistent. From a purely structural standpoint, it wasn’t wrong. In fact, some of the changes were arguably cleaner.

The problem wasn’t correctness. It was scope.

I had asked for a surgical refactor inside a clearly bounded feature. The agent interpreted the request as architectural permission. That gap between intent and execution is where things get interesting.

And because I was reviewing another terminal at the same time - watching diffs, responding to comments, scanning localization output - I didn’t catch the drift immediately. It took several minutes before I noticed files lighting up outside the agreed boundary.

That’s the real cost of parallelism.

Parallelism multiplies output velocity. It also multiplies oversight responsibility. Every additional stream of work demands a slice of attention, and attention doesn’t scale linearly.

The bottleneck doesn’t disappear when you add agents. It relocates.

It moves from typing speed to boundary enforcement. From mechanical implementation to cognitive load management. From “can I write this quickly?” to “can I supervise three evolving reasoning chains without letting one wander?”

That tradeoff is real. And if you’re not intentional about it, parallel agents don’t feel like a band - they feel like noise.


SwiftUI-Specific Observations in Parallel Mode

Running multiple agents against a SwiftUI repo surfaces patterns quickly, and what stood out wasn’t any single fix - it was the density of improvement happening at once.

The refactor stream removed redundant @StateObject declarations, simplified objectWillChange plumbing, and cleaned up binding paths that had slowly accumulated over time. Nothing dramatic, but the kind of structural cleanup that reduces friction across an entire feature.

At the same time, the review stream caught unnecessary view invalidation, flagged a .task modifier that ignored cancellation, and pointed out a missing @MainActor where concurrency assumptions were leaking. Those are the kinds of issues that normally surface weeks later under load.

Meanwhile, the localization audit uncovered hardcoded strings scattered across views, inconsistent key casing between languages, and subtle formatting mismatches that would have quietly degraded polish over time.

Individually, each of those corrections is minor. In parallel, they compress feedback loops dramatically.

By the time I opened a formal PR, most of the obvious issues had already been surfaced, discussed, and addressed.

Review wasn’t a phase anymore. It was continuous.


Manual Worktrees vs Programmatic Orchestration

At some point it became obvious: I was manually simulating something the Claude Agent SDK formalizes.

In the SDK, you define subagents with scoped responsibilities and restricted tools.

(Note: the actual Agent SDK is Python/TypeScript. The snippet below is Swift-styled pseudocode for familiarity with a SwiftUI audience - don’t try to import it literally.)

let refactorAgent = AgentDefinition(
    name: "RefactorAgent",
    description: "Performs scoped SwiftUI refactors.",
    tools: [.fileReader, .fileWriter, .diffInspector]
)

let reviewAgent = AgentDefinition(
    name: "ReviewAgent",
    description: "Performs architectural and performance review.",
    tools: [.fileReader, .diffInspector]
)

let localizationAgent = AgentDefinition(
    name: "LocalizationAgent",
    description: "Audits localization coverage and consistency.",
    tools: [.fileReader]
)

let orchestrator = Agent(subagents: [
    refactorAgent,
    reviewAgent,
    localizationAgent
])

Important constraint: subagents cannot spawn additional subagents. The structure is flat. Predictable.

Each subagent also runs in its own isolated context window, which prevents cross-contamination of reasoning and keeps task boundaries clean - the same isolation principle you’re manually enforcing with separate worktrees.

Which mirrors worktrees perfectly.

You are the orchestrator. Each worktree is a worker. No recursive chaos.

If you prefer a zero-code approach, Claude Code supports .claude/agents/ markdown definitions.

---
name: SwiftUI Performance Reviewer
tools: Read, Grep, Glob
---

Review SwiftUI diffs for:
- State misuse
- Excessive view invalidation
- Main thread violations

Only comment on changed files.

Commit it, and your repository gains a reusable specialist.

Worktrees teach the mental model. The SDK automates it.


The Identity Shift (White Hard Hat Moment)

Here’s the part that might sting a little.

If your value is anchored primarily in how fast you can write Swift, this workflow presses directly on that identity. When three agents are moving at once, raw typing speed isn’t the differentiator anymore; the differentiator is how cleanly you define boundaries, how precisely you decompose a feature, and how disciplined you are about scope.

Typing speed stops being the lever. Decomposition becomes the lever. Boundary clarity becomes the lever. Prompt precision becomes the lever.

You don’t stop coding, but you stop measuring yourself by keystrokes. The hammer doesn’t disappear - it just isn’t the only tool on your belt. More often, you’re wearing the white hard hat, walking the site, checking load-bearing walls before anyone swings.

And when something cracks - and it will - you grab the hammer without hesitation. That’s the balance. Kill ‘Em All energy is powerful, but it’s chaos without structure. Black Album control is deliberate, restrained, and scalable. The engineers who can move between those modes - intensity and restraint, speed and supervision - are the ones who thrive in this model.

That balance isn’t optional anymore. It’s the job.


Where This Is Heading

Right now we’re juggling worktrees manually.

Soon, orchestrators will spawn subagents automatically. Tool permissions will constrain behavior. Context windows will remain isolated by design. Results will roll back into a single branch cleanly.

The pattern doesn’t change:

Isolation. Delegation. Review. Merge.

Git worktrees aren’t the destination. They’re rehearsal.


Try This This Week

Pick three tasks from your sprint - real ones, not synthetic experiments - and shrink them until they feel almost uncomfortably narrow. Not “refactor the feature.” Not “clean up the module.” Those are ego-sized tasks. Instead, think in surgical cuts: refactor state model only, audit rendering paths only, validate localization keys only. Small. Isolated. Non-colliding.

Create three worktrees and give each one a clearly bounded mission. Launch three agents, but don’t let them roam. Be ruthless about scope. If an agent touches a file outside its lane, pull it back. If a prompt is vague, tighten it. Treat it less like delegation and more like conducting - every instrument has a part, and no one improvises over someone else’s solo.

Then watch what happens to your role.

You’ll notice that you’re not racing to keep up with output. You’re supervising it. You’re reviewing diffs as they land, adjusting direction mid-flight, thinking two steps ahead instead of reacting line by line. The first time you realize you’re reviewing parallel streams of progress before lunch - not because you worked faster, but because you decomposed better - something shifts internally.

The AI didn’t level up in that moment. You did.

And that shift - from typing alongside the machine to orchestrating it like a tight, rehearsed band - is the real evolution. That’s the difference between using AI as a louder hammer and running the stage with Black Album control.


More Real-World iOS Survival Stories

If this post resonated, this is the kind of thread I keep pulling on.

I write about real SwiftUI codebases, architectural tradeoffs, orchestration patterns, and the messy edge cases that show up after the demo works. Not theory. Not toy apps. Production thinking applied to side projects and real-world systems.

Vinyl Crate is where this particular experiment happened - not in a sandbox, but in a repo with history, state complexity, localization debt, and architectural decisions that evolved over time. That’s where ideas either hold up or fall apart.

If you’re interested in building that kind of muscle - decomposing work cleanly, running parallel agents without chaos, and knowing when to grab the hammer versus when to conduct the band - you can explore more here:

medium.com/@wesleymatlock or wesleymatlock.com

We’re not just coding faster. We’re redesigning how work moves.

Black Album control. Kill ‘Em All intensity.