Knowing what to hand an agent is most of the skill. There's a clear pattern to the work it does well, and an equally clear pattern to the work you should keep. Get the sorting right and the agent feels like a genuinely useful teammate. Get it wrong and you'll spend more time fixing its output than you saved. The same sorting holds for any coding agent, so if you also work with OpenAI's, our Codex course covers the same ground for that one.

The pattern: clear inputs, a checkable result

An agent shines when a task has well-defined inputs and a way to tell, objectively, that it's done. That's why so much of its sweet spot is work with tests or an obvious correct shape. When success is checkable, the agent can iterate against it until it's right, which is exactly the loop it's built for. When success is a matter of taste or judgement, it's on much weaker ground. Hold that pattern in your head and the rest of this list follows from it.

The work it handles well

Features within a clear design. A bounded feature where you've made the design call and can describe the behaviour and the acceptance criteria. You decide the shape; it builds inside it.
Refactors. Renaming across the codebase, extracting a function, splitting a fat module, modernising an old pattern, with the test suite as the safety net that proves nothing broke.
Tests. Adding coverage to existing code, filling in edge cases, writing unit tests for a pure function. Clear target, checkable result. A great place to start.
Debugging with reproduction steps. Give it a bug you can reproduce and it can find the cause, write a failing test that captures it, then fix until that test goes green. The failing test is the target, which is exactly what it needs.
Codebase questions. "Where is auth handled?" "What calls this function?" "What would break if I change this shape?" It reads the code and answers, which is a fast way to get your bearings or onboard someone new.
Multi-file changes. A consistent change applied across many files: a library upgrade, a syntax migration, moving off a deprecated call, threading a new parameter through. Agents are tireless and consistent, which is the whole job here.
One-off scripts and internal tools. A quick data fix, a migration script, a small admin page for the team, or the glue that joins two systems with no off-the-shelf connector, the sort of custom integrations we build by hand. Low blast radius, clear purpose, real time saved.

The thread through all of it: bounded scope, a pattern to follow or a test to satisfy, and a low cost if a draft isn't perfect. That's the zone where an agent quietly clears your backlog.

What to keep firmly with people

The flip side matters just as much. Some work is yours, and handing it to an agent is how you get confident, well-written mistakes. Keep these:

Architecture. How the system is shaped, the boundaries, the big trade-offs. An agent can draft inside a design; it shouldn't be the one choosing the design.
Security-critical paths. Authentication, authorisation, payments, anything handling sensitive data. High blast radius and subtle failure modes. A person owns these and reviews them closely.
Ambiguous product calls. When the "right" answer depends on what the business wants, or there's a real trade-off with no clear winner, that's a decision, not a coding task. Make the call yourself, then, if it helps, hand the agent the now-clear job.

A simple test before you delegate: is this well-defined, low-risk and checkable, or does it need judgement and carry blast radius? The first kind is agent work. The second is yours. Most tasks sort themselves cleanly once you ask.

How to sort the work: hand the agent jobs with clear inputs and a checkable result, features inside a design you've set, refactors, tests, debugging with reproduction steps, codebase questions, multi-file changes and one-off scripts. Keep architecture, security-critical paths and ambiguous product calls firmly with people. When in doubt, ask whether it's well-defined and low-risk or judgement-heavy and high-blast-radius. Next up: keeping humans in the loop.

The real jobs it's good at.

The pattern: clear inputs, a checkable result

The work it handles well

What to keep firmly with people

Saved.