Reference

"Andrew (Ansley, unconfirmed) - Vibe coding and harness engineering with AI agents"

Andrew's Day 1 case for harness engineering, the disciplined layer above vibe coding that makes Claude Code agents repeatable through domain-first naming, lean context windows, and hooks.

On this page

Main takeaways
Key points
Slides
Source

On Day 1, a developer who introduced himself only as Andrew (the session filename references "Andrew Ansley," so the surname is unconfirmed from the recording) of Thorbit.ai walked the room through harness engineering: the disciplined layer above raw "vibe coding" that makes AI coding agents (specifically Claude Code) repeatable and scalable. His core thesis is that everything you do with an agent is a prompt, so the goal is to build an operating system around the AI rather than write better one-off prompts. He demoed a Q&A and poll web app he says he vibe-coded in two to three hours, then spent the talk on the constructs that turn one-off output into a reproducible system.

Main takeaways

Harness engineering beats vibe coding for anything you want to reproduce. Andrew frames three tiers (vibe coding, vibe engineering, harness engineering). Raw vibe coding can produce good output once, but you cannot reliably reproduce it. The fix is building a system around the AI, not writing better isolated prompts.
Name folders after the work (the domain), not the code type. Domain-first naming (for example a folder called "estimates" for a construction company) lets agents grep and navigate intuitively, which Andrew says cuts token use by roughly 50% and makes the AI work about twice as fast.
Keep context windows under about 100k tokens and use sub-agents to fetch context. Andrew claims agent performance drops off significantly past 100k and hallucination climbs after 200k, so large advertised windows (he cites a 2M Grok figure) are a "money grab." Thin sub-agents that gather and report back keep the main window lean.
Hooks are the anti-hallucination and automation layer. Any executable can fire on Claude Code events (before a tool runs, on session end, and so on). A single "stop" hook (the "Ralph Wiggum" pattern) creates a self-correcting agent loop. Hooks give observability, enforceability, dependability, and therefore scalability.
Build the system, not the prompt. Because no memory persists between agent runs (the cold-start problem), the brief is everything. The system's job is to build that brief automatically.

Key points

Speaker background and claims

Goes by "Andrew"; no surname given in the transcript (session filename says "Andrew Ansley," unconfirmed).
Quit his agency about two years ago; now works roughly 50 hours a week in the terminal with Claude Code.
Trains developers and converts agencies and businesses; references working with companies up to about $100 million in revenue.
Was previously an SEO who "read patents all the time" to learn optimization.
Built AI writers before; says the key to great AI content is deconstructing the subject and working at an "atomic level" (a paragraph at a time, not whole-document).

Tools, products, URLs

Thorbit.ai is his product, described as an attempt to make AI that "does all of marketing" (not complete yet).
slides.thorbit.ai is the live Q&A and poll app he demoed; event code UXEP.
seost-2026.thorbit.ai is the conference slides for this talk.
The demo app was vibe-coded in about two to three hours; he says he built a SaaS with a database via "three prompts" and deployed it.
Free terminal-onboarding course (two hours of videos, "no marketing, no funnel"), shared by request.
Promised a free "meta creator" repo (creates experts, prompts, commands) "tomorrow"; could not access his Mac Mini in-session, so it was not delivered live.

Three tiers of building

Vibe coding: "all you have is desires," prompt and "grip it and rip it," no plan. Can be done well but is slow; the person who did it well worked one sentence at a time.
Vibe engineering: layered on top of vibe code; you learn a little code (file and function naming). About six categories matter (classes, types, schema, structures, interfaces, enumerations).
Harness engineering: learning hooks, commands, skills, and agents; makes outputs repeatable and scalable.

Domain-first naming

Name folders after the work (the domain), not the code type (example: an "estimates" folder for a construction company).
Only the first folder needs code-style naming; deeper subfolders can use intuitive, keyword-rich names like a Google Drive.
"grep" is keyword searching in the terminal; domain naming means a grep for "estimates" surfaces the whole relevant universe instead of a fraction.
Claimed benefits: roughly 50% fewer tokens and AI working about twice as fast.
Infrastructure still keeps code-style homes (lib, src kept; inside src you name by domain or keyword).

Context window management

Claim: agent performance drops off significantly after about 100k tokens; drift after 100k to 200k; heavy hallucination after 200k.
Cites Grok 4.2 multi-agent at a 2M context window and calls huge windows a "money grab" and "disingenuous" given the dropoff.
Use /context in a fresh Claude Code terminal to see your starting tokens; most start around 25k to 50k; Andrew says the lowest Claude baseline is about 16k tokens.
Strategy: keep the main context under about 100k by deploying thin sub-agents that gather context and report back.

The cold-start problem

No conversation history passes between sessions or between agents; every agent arrives "with nothing."
The brief you write is the only thing the agent knows; a vague brief is a system-design problem, not a writing problem.
The system's job is to build the brief automatically (take your words, inquire for gaps, build the brief).

"Everything is a prompt"

Plans are prompts, specs are prompts, your entire message history is a single prompt.
A "cluttered harness" (many agents, skills, MCPs, plus system and task messages all loaded) burns the context window.

The harness: hooks, commands, skills, agents

Skills: a folder plus a markdown file; a fraction is loaded into every conversation, so the AI is always aware of it. Too many can confuse the AI.
Commands: same folder-plus-markdown format as a skill but NOT loaded into every conversation. Andrew notes Claude treats commands and skills as effectively the same in its docs; commands now require a subfolder. This lets you have hundreds without quality degradation.
Agents: same format; the only real difference is an agent runs in a separate context window. Use case: deploy about 15 sub-agents at once to gather context and report back.
Recommends making agents and skills template-driven (folders of templates and references) using "spin text" style bracketed placeholders so one thin agent can serve many uses.

Hooks (the standout topic)

A hook is anything executable, firing on terminal events (before a tool is used, before it responds, on sub-agent deploy, on session end, and so on). Claude has roughly 15 hookable areas.
Hooks prevent hallucination and enforce behavior; Andrew says "no one talks about this on YouTube."
"Ralph Wiggum" is a stop hook that tells the agent "you're not done," sends it back to the original spec, and loops, creating a continuous loop of working agents.
Hook mantra: observability, enforceability, dependability, scalability.
Example: a hook that blocks an agent action until a required file is written.
Example: a hook that blocks all code comments and deletes them (comments go stale; "the code itself can be the documentation" with good structure).
Example: a "guide skill" updated by a hook whenever a new skill, command, or agent is deployed (an all-knowing map of the codebase).
Example: an "expert" with a "self-improve" command that runs a diff check (past versus current) and updates itself.

Other named patterns

"Golden master test": tell Claude to create a golden master test so it tests the whole thing for real instead of mocking trivial tests.
Trust-level output gating: name every output file with its trust level (validated, approved, needs-review) via a small JSON or YAML checkpoint file with limited allowed states (he compares it to ClickUp and Asana status gates).
Recommended minimal harness: about 10 skills, about 10 agents, plus many commands. His roster: a builder, a reviewer, an architect, a tester, experts, and a "meta" (the creator of the systems itself).
He uses 42 templates ("42 units of code") but says you can start with about 6.

Start-next-week procedure (his closing)

Find the prompt you retype the most and write it as a skill (a folder plus a markdown file).
Add one hook that fires when a phase completes; have it write a one-line log.
Name every output file with its trust level (validated, approved, needs-review) by writing to a small JSON or YAML checkpoint file that allows only a handful of states.
Build the system, not just the prompt.

Audience Q&A points

A "reading skill about..." the user never wrote: Claude's ~16k baseline includes pre-loaded skills (for example a front-end design skill), and the desktop app spins up a sandboxed terminal running Claude Code that loads skills.
View installed skills and agents via file explorer at ~/.claude.
Common mistake: keeping both per-project and global skills or agents creates hidden memories and logs that confuse the AI.

Slides

The deck below is from a SEPARATE presentation, "Vibe Coding for Non-Coders" by Aaron Gruenke, NOT by Andrew. Aaron Gruenke and Andrew are different presenters with different tools and offers; no deck file was provided for Andrew's actual talk. The slides are included here for reference only and should not be attributed to Andrew.

Slides (11) - Aaron Gruenke deck, a different presenter (not Andrew)

Source

Session transcript: SEO ST Day 1 (Andrew of Thorbit.ai, harness engineering). No deck file was provided for Andrew's talk. The embedded slides above are from a different presenter's deck file, aaron-gruenke-seo-st-vibecoding-rev5 (Aaron Gruenke, "Vibe Coding for Non-Coders"), included for reference only. Source notes live in the project knowledge folder, which is not published on the live site.