Real computer work, not just chat.
Your workers run a dev server, deploy over SSH, parse a spreadsheet, and search a codebase โ using persistent sessions that keep state across turns. Every powerful action is approval-gated and audited.
Start a process โ a dev server, a build, an interactive REPL โ and it stays alive across turns. The worker iterates on your code while the server reloads, watches the output, and stops it when done. Output is ring-capped so logs never blow up the context.
start_process "npm run dev" โ edit a route โ read the reload โ stop it.
A Python or Node session stays open across many tool calls. Load your data once, then ask question after question against the same loaded state โ no re-setup between asks.
python3 -i โ import pandas; df = โฆ โ then many queries on df.
Deploy and operate servers over SSH using saved host profiles โ your private keys stay out of the model's context, referenced by path only. Hardened for unattended use: never hangs on a prompt, accepts new host keys safely, times out cleanly.
ssh(profile="prod", command="systemctl status") ยท scp a build up and extract it.
Parse spreadsheets (.xlsx/.ods), Word docs, and PDFs directly โ sheet names, rows, and text come back ready to reason over. Pair it with document creation for a full read-and-write loop.
read_document "Q3-budget.xlsx" โ summarise the key numbers.
Full-workspace regex search using ripgrep's walker (honours .gitignore), recursive directory listing, move/rename/mkdir, and file metadata โ all workspace-scoped and safe from path-escape tricks.
fs_search content for a pattern โ every file + line โ propose the change.
Built to finish, not just to start.
A long, multi-step run survives the things that usually break agents: a malformed tool call is repaired and retried; old turns and huge outputs are summarised to keep context lean; the agent keeps its own checklist across the chain; and long tasks checkpoint so a restart resumes from the last step, not the beginning. When work is delegated to another agent, its progress streams back into your timeline live.
Put a worker to work.
Give one a goal โ "spin up the dev server, fix the failing test, and deploy" โ and watch it run end to end.