Babysitting AI

Simon Willison, co-creator of Django, described using AI coding agents in April 2026 as "mentally exhausting, even as they make my work faster." He reported being able to fire up four agents in parallel and being completely wiped out by 11 a.m.

Steve Yegge, veteran of Amazon and Google, argued that engineers should treat deep, high-output AI-assisted sessions as a limited resource. Even the most productive developers can sustain only about three hours per day before burnout sets in.

Both are describing the same phenomenon: a new kind of exhaustion in knowledge work that doesn't come from overwork in the traditional sense. It comes from waiting. Specifically, from waiting on AI agents while being too invested in the outcome to walk away, and too aware of the cost to fire off another run in parallel.

This is the map of that problem: the psychology of AI-native waiting, the economics of cost awareness, the hidden cost of serialized human attention, the design failures that make it worse, and the emerging patterns that might make it better.


The 10–60 Second Trap

Most AI products can handle two latency states well: fast (under 2 seconds) where you don't think about waiting at all, and genuinely slow (over 60 seconds) where you treat it like a background job and check back later.

The painful zone is everything in between. Roughly 10 to 60 seconds. Long enough to notice, short enough that you don't feel safe walking away.

If you've used ChatGPT's Agent Mode, Claude's long-running code reviews, or multi-step Cursor flows, you've sat in this zone: watching progress for 25 to 45 seconds, not sure if you should wait, context-switch, or abort. Research on agent UX calls this gap "the abyss," the most neglected surface in AI product design. Tian Pan

< 2s
10 – 60s
60s+

Invisible

The Abyss

long enough to notice, too short to walk away

Background Job


Streaming ≠ Designed Waiting

To be fair, the surface has improved. ChatGPT and Claude stream tokens and sometimes label steps ("Searching…", "Analyzing…"). LangChain and similar stacks now ship "stream events" specifically so developers can build rich progress UIs. Some tools render intermediate plans, tool calls, or "thinking" traces. DEV

But if you zoom out across real products, a consistent gap remains: Skill4Agent

Tool / Mode What it does well in 10–60s What's still missing
ChatGPT (chat) Streams tokens quickly, minimal friction No sense of how long a complex, tool-heavy answer will take; spinner if tools stall
Claude (chat/code) Structured reasoning bullets, expandable when needed Reasoning hidden by default; you still don't know if this is a 5s or 45s wait
ChatGPT Agent Mode Shows a live browser and actions as it clicks around Tasks routinely take 5–30 minutes; no explicit guidance on "stay vs. leave," weak background mode
Agentic IDEs (Cursor, etc.) Show step-by-step operations in a sidebar, sometimes file diffs Long runs collapse back into generic "running…" states; flaky or stuck states are hard to distinguish from slow ones

Streaming solves "is it frozen?" It does not solve "how long will this hold my day hostage?" or "what should I do while this runs?"

A practitioner post from developer advocate Christian Bromann in January 2026 stated flatly: "Hot take: most 'agent progress' UIs are still guessing. A spinner isn't progress. It's a placeholder for uncertainty." DEV


The Vigilance Tax

Your Brain on "Idle but vigilant"

The HCI literature on waiting becomes relevant when you remember you're a human. People tolerate waits much better when duration is bounded ("about 30 seconds") versus unknown. UX Design

Passive animations like spinners and pulsing dots often increase perceived waiting time and annoyance versus either a blank screen or a meaningful activity. UX Psychology

In AI, that gets multiplied by two facts:

The wait is expensive. You know this run is burning tokens, CPU, or internal budget. You're not just waiting; you're watching money tick away.

The wait gates your next decision. You can't safely move on because the output might force you to change your plan. You're not waiting for a taxi; you're waiting for the answer to "what should I do next?"


Why Passive Waiting Depletes You

That's a different category of waiting: a mix of vigilance (monitoring a system for a signal) and decision prep (knowing you'll have to judge the result the moment it arrives).

Research on sustained attention shows that humans cannot maintain ready attention over extended periods without degradation. Performance on vigilance tasks declines steeply in the first 30 minutes of monitoring, even for highly motivated operators in high-stakes contexts. The "idle but anxious" waiting state that AI agent users experience is precisely a vigilance task, and it's cognitively expensive even when nothing is "wrong." PMCPMC

The vigilance literature, originating from WWII radar operator studies, consistently shows that sustained attention failures are "primarily due to sustained cognitive load, not task monotony." The mental effort of being ready to respond depletes capacity, regardless of whether anything is happening. Human biology imposes the impossibility of perfect sustained attention due to neural oscillations, frontoparietal network fluctuations, and neurochemical adaptation. PMC

The moment the agent finishes, you start a second, completely different cognitive task: Is this good enough? Do I re-run with a different prompt? Does this change the plan for everything downstream?

Work on recovering flow after AI interruptions points out that this re-entry—reloading the entire problem into working memory to judge an output—is where many minutes of mental energy disappear. Addicted to AI

Even if the agent is "helping," your cognitive trace for the task looks like: focus → break → vigilance → evaluation → re-focus. Instead of: focus → continuous work → done. That's why sessions with "helpful" agents leave people more exhausted by 11 a.m. than coding alone. AI Invest

With agents
focus break vigilance evaluation re-focus
Without
focus continuous work done

Research on context switching finds that recovering full focus after a task interruption takes up to 23 minutes, and that typical knowledge workers already switch tasks every 10 minutes. Adding AI agent monitoring to that environment doesn't reduce context switching. It adds a new, unpredictable trigger. Todoist

The specific phenomenon being documented among developers is "oscillation fatigue": the mental exhaustion caused by the repeated back-and-forth of validating AI suggestions against one's own judgment and workflow. Aionda


Mobile Makes It Worse

On mobile, this becomes qualitatively worse. The phone is a device carried in hand and checked compulsively, with average session lengths under two minutes. An agent task that takes 5 minutes on mobile is five full average sessions strung together. An eternity in mobile UX terms.

Progress indicators that work on desktop (a sidebar panel showing agent steps) occupy too much screen real estate on mobile. Background notification systems that would allow genuine context-switching during long runs are architecturally limited by mobile OS constraints on background processing. AI Coding Flow


When Agents Run for Minutes, Not Seconds

Idle but Tethered

Short waits are annoying. Long-running agents are something else: you're not really working, not really resting, just… around.

When an agent runs for 5–30 minutes (browsing, calling APIs, generating drafts), most products still give you essentially one experience: a log stream or a "running…" label you're not meant to stare at but can't quite ignore. You should go do something else. But you don't know:

So you hover. You flip to other tabs, check your phone, come back "just to see how it's doing." You're in that on-call mental state where you're not in crisis, but you also don't feel safe dropping your guard.

Harrison Chase on the future of agent UX: "If an agent runs for two hours, you don't want to watch it for two hours." LinkedInRedditVentureBeat


The Babysitting Trap

The irony is that long-running agents fail in the middle a lot—context walls, flaky tools, weird drift. That "idle" time often turns into rushed cleanup anyway. You didn't get deep work. You didn't get rest. You still got paged to fix something. Mind Studio

That's what makes this such a nasty form of babysitting: the system blocks off a chunk of your day as "its" time, and then still needs you to parent it when it gets stuck. There's no artifact that cleanly explains those thirty minutes on your calendar, but your body absolutely feels them.


When Your Agent Runs for 2–10 Minutes and You Have Nothing to Do

The most demoralizing waits aren't the 1-second hiccups or the 2-hour batch jobs. They're the 2–10 minute runs where the agent disappears into "thinking" and leaves you with absolutely nothing useful to do.

Right now that experience is almost always the same: you give the agent a serious task, the UI flips to "running…", maybe a few log lines crawl by, and then silence. It's too long to sit and watch without going a little bit insane, but too short (and too unpredictable) to justify fully switching into something else. You could leave, but you don't know if it'll need you in 30 seconds, or if killing it later will waste the time and tokens you've already sunk. Jakob Nielsen

Classic service and HCI research is clear on two things that matter here: unoccupied time feels longer than occupied time, and uncertain, open-ended waits feel worst of all, especially when you don't know what you're waiting for or how you'll be pulled back in. PubMedACM

Most 2–10 minute agents manage to hit both: you're unoccupied and uncertain. That's why they feel like pure babysitting.


The Workflow Mismatch

Agents Think in Turns, You Work in Branches

Most "agentic" UX today assumes a sequential mental model: give the agent a task, wait, respond to its output, repeat. But most real design and product work does not look like that.

You hold branches in your head: "If the research comes back negative, we'll pivot to concept B." "If this refactor is too painful, we'll ship a smaller workaround first." You sketch parallel directions, then prune.

The structural mismatch is this: large language model agents operate sequentially, one turn at a time, with each step depending on the last. Human knowledge work—especially design and research—naturally flows in parallel, iterative, and speculative branches. DEV


Why Parallel Agents Fail

Tools technically support parallel agents. Cursor and similar IDEs let you run multiple tasks in a sidebar or across workspaces. ChatGPT, Claude, and others make it easy to open new chats or tabs. But there's a human bandwidth ceiling.

Addy Osmani points out: LinkedIn

"After about three concurrent agent sessions you stop being a supervisor and start being a bottleneck because you can't context-switch fast enough to actually review the output… Moving between agents means reloading a mental model: where this task started, what approach the agent settled on, what decisions you made thirty minutes ago that now constrain what's acceptable. With four parallel threads, you're paying that tax constantly, and the recoveries never fully complete before the next switch."

Multi-agent research backs this up. Fork-merge research on parallel agent architectures documents the fundamental tension: while internal systems saw 90.2% performance improvement by using 3–5 parallel subagents, "conflicting implicit assumptions" from parallel workers produce results that don't reconcile without a human mediator, and "the field has strong forking primitives and weak merging primitives." The merging problem—how to integrate outputs from parallel agents that made divergent assumptions—is the user's cognitive problem, not just the system's. Zylos

Given that parallel agents multiply review and merge work, and parallel agents multiply token cost footprint, serializing high-stakes work around one or two agents at a time is rational, not Luddite. Galileo


The Oversight Paradox: Too Much Monitoring, Not Enough Judgment

There are two different, easily-confused problems in agent UX:

Anxious Monitoring: You watch the agent constantly because you don't trust its failure modes, logging, or guardrails. You're worried it might do something expensive, harmful, or embarrassing if you look away.

Blanket Approval Gates: The workflow forces you to click "approve" at every tiny step, regardless of risk. A study of AI-assisted coding found blanket step-level approval slowed experienced developers by roughly 19% and led to rubber-stamping, not better judgment. YouTube

Both feel like "babysitting," but the design remedies are different. The emerging pattern from infrastructure and UX guides is minimum viable oversight: define which actions truly need human checkpoints (irreversible writes, external communications, compliance-sensitive outputs), give users strong logging and rollback instead of requiring eyes-on every minor step, and design check-ins as rich reviews (with context and diffs), not as endless binary "approve/deny" modals. Redis

The goal is that humans review less often but more meaningfully, instead of clicking through thousands of low-stakes prompts. Research cited in technical community discussions found that strict AI oversight actually slowed experienced developers by 19%, with acceptance rates for AI suggestions dropping below 44%—not because the suggestions were bad, but because the overhead of evaluating and approving every step was breaking rather than improving the work. YouTube


One Counter-Argument Worth Addressing

"Won't faster models solve this?"

Partially addressed below, but the short version: model inference is already fast. The bottleneck is tool calls, retrieval, real-world APIs, and multi-step dependencies. Even if GPT-7 runs at 1000 tokens/second, an agent that needs to browse three websites, call five APIs, and wait for external rate limits will still take minutes.


Why This Won't Age Out

Speed Won't Save You

A 2026 latency benchmark shows frontier models answering typical prompts in 1.8 to 4.5 seconds; the slow part in agents is tool calls, retrieval, and multi-step orchestration. AIMultiple

Retrieval is becoming the bottleneck. Multi-stage vector search and cross-index aggregation routinely take 370 to 910ms per query, and complex agents call retrieval many times per run. Shaped

Real-world tasks involve the real world. ChatGPT Agent Mode workflows that fill forms, browse multiple sites, and run headless browsers clock in at 5 to 30 minutes regardless of model speed. YouTube

Even if model latency goes to near-zero, you still have multi-step dependencies ("don't draft email until you've validated these 5 records"), you still have external APIs and network, and you still have perception. Very fast answers can feel shallow or untrustworthy; people sometimes prefer a short pause that feels "thoughtful." Neuroscience News

IDC research notes that "end-to-end latencies above 2–3 seconds per agentic cycle often trigger degraded decision quality or timeouts," but that latency "compounds fastest during tool calls and memory access in multi-step agent workflows"—the parts that won't be solved by faster inference. Reddit

Speed helps. It doesn't erase the design problem.

Where agent time actually goes

Model inference

Tool calls

Retrieval

Real-world tasks (browser, APIs, forms)

Faster inference shrinks only the leftmost slice. Tool calls, retrieval, and real-world dependencies don't compress with model speed.


What Good Looks Like

If you were designing an AI tool today and took all of this seriously, a non-hand-wavy spec might include:

Set Expectations Before the Wait Starts

Tell users what kind of wait they're in for before the agent starts. Not necessarily a precise ETA, but a time-shape: "This will take ~20–40 seconds. Steps: retrieve, analyze, draft" or "This could take a few minutes; we'll notify you when it's ready." This converts an open-ended wait into a bounded, narrated interval. Tian Pan


Match the UI to the Wait

Zone 1–2

< 10s

Tiny indicators, minimal ceremony. Don't interrupt the user.

Zone 3

10–60s

Visible plan, streaming progress, ability to pre-queue follow-ups.

Zone 4

60s+

Background by default. Ambient status, strong re-entry points, push notifications.

Nielsen Norman Group

Beyond expectation setting, Zone 3 waits benefit from streamed reasoning, interim tool-call summaries, and plan previews—any UI pattern that ties user attention to visible agent progress. The goal is not to entertain users during the wait; it's to give them enough signal to make a judgment about whether the agent is on track. Fuselab Creative


Turn Long Waits into Background Jobs

If you accept that agents will sometimes take minutes or hours, then "nothing to do but wait" is a design choice, not a law of physics. There are at least three ways out:

Treat long runs as jobs, not stretched chat replies. Once a task will run longer than roughly 60 seconds, it should flip UI modes: turn into a named job with an ID, start time, and status ("running", "waiting on API X", "paused for your review"), and move out of the main chat canvas into a jobs list or activity feed, so the primary space isn't held hostage. The point is to tell the user, explicitly: "You don't need to sit here. This is parked somewhere safe. We'll bring you back when it matters." Tian Pan

Design re-entry, not log streams. Instead of a scrollback of actions, design for the moment the agent needs you again. When the job finishes or hits a checkpoint, surface a summary card: what it did, what it produced, and what decision it needs from you right now. Make that card the entry point from notifications (email, in-app, mobile push), so re-entry is a one-click jump into context, not a spelunk through logs. This flips the mental model from "keep an eye on it" to "we'll page you when there's something worth your attention." Zero NoiseVentureBeat

Build in timeboxes and safe aborts. Right now, a lot of the anxiety comes from not knowing whether you've kicked off a 30-second job or a 3-hour one. Long-running agent posts routinely advise "don't let them run 24/7"—which is a human patch for the lack of product timeboxes. A better default: ask for a time or step budget up front ("run this up to 20 minutes or 50 steps, then pause"), and when that budget's hit, stop and ask: "Continue, adjust, or stop here?" Make "Stop and keep partial results" a first-class option so users aren't afraid to kill a run that feels off. That gives users permission to not babysit: the system itself promises not to run unchecked forever. RedditAddy Osmani

Now

running...

> Fetching context

> Analyzing documents (3/47)

> Calling external API...

> Still running...

> Still running...

Nothing you can do here.

Better

Draft competitor analysis

Running

Started 3 min ago · Step 6 of ~15

"Fetching pricing pages for 4 companies…"

You don't need to stay here.


Give Users Something to Do While They Wait

Rather than leaving users in cognitive suspension, the UI could offer:

bprigent

These patterns reframe waiting from "dead time" to "preparation time"—the same conceptual shift that redesigned hospital pre-admissions and airport boarding from idle waiting into productive pre-processing.


Design for the 2–10 Minute Window

If you accept that multi-step, tool-heavy agents are going to live in the 2–10 minute band, then "nothing to do but wait" is a design decision, not a law of physics. There's a richer space of things the UI could offer your brain while the model grinds.


Branch Time Instead of Dead Time. The simplest shift is to treat those minutes as branch time, not dead time. If the agent is off executing Plan A, the UI can invite you to sketch Plan B and C in parallel. Concretely: a "next branch" panel that asks "If this looks promising, what would you want to do next?" and lets you queue 1–2 follow-up tasks. When the result lands, those are attached as ready-to-run options. An assumption/constraint editor during early steps ("retrieving…", "parsing…") lets you refine guardrails the agent will use later: "only use post-2022 data," "exclude internal docs," "keep examples under 100 words." The agent reads these before it drafts, so your idle time directly shapes the outcome. UX Tigers

You're not just killing time; you're setting up the next moves while the current one plays out.

Next Branch Queue

If this looks promising, what next?

1 Run same brief for EU market
2 Generate exec summary version
+ Add follow-up task…

Constraints · Active

Agent reads these before it drafts

only post-2022 data exclude internal docs keep examples <100 words + Add constraint

Show Partial Results as "Appetizers." Instead of hiding everything until the end, longish runs can progressively reveal useful slices: early outline or bullet previews ("Here's the high-level structure I'm building"), first code diffs or a subset of modified files, initial records found ("15 of ~120 docs identified so far"). Clearly labeled as "rough" or "early pass," these let you bail early if the direction is obviously wrong, drop comments ("this section is irrelevant, skip it"), and start thinking about how you'll use the result before the full payload arrives. Users in AI-generation studies report being more accepting of waits when the system feels like it's building toward something, not just hiding a progress bar. ACM

Competitor analysis · 4 min in
ROUGH · early pass

Here's the high-level structure I'm building:

1. Pricing model comparison done
2. Feature matrix in progress
3. Go-to-market positioning queued
4. Key differentiators queued
15 of ~120 docs identified so far

Micro-Choices That Redirect Without Re-Prompting. You don't always have the energy to write another prompt mid-run. You might have the energy to tap a toggle. Offer lightweight micro-decisions that actually steer the run: style (short vs. detailed, formal vs. casual, cautious vs. bold recommendations), scope ("go deeper here," "skip generic background," "spend more time on examples"), and focus chips pulled from the initial brief ("metrics", "UX detail", "eng risk") that let you re-weight emphasis with one click. These are small, reversible inputs that keep you engaged just enough to feel like a collaborator, and reduce the chance you'll sit through 8 minutes of something obviously misaligned. Jakob Nielsen

Steer this run

Style
Short Detailed Formal Casual
Scope
go deeper here skip generic background more time on examples
Focus
metrics UX detail eng risk

Guided Reflection: Use the Time to Think, Not Just to Watch. Think-time UX work argues that some of the best "waiting activities" are about the problem, not the UI. During a 2–10 minute run, the product can offer reflective prompts in the periphery: "If the agent could only get one thing right, what should it be?" "What's the failure mode you're most worried about?" "What would 'good enough to ship today' look like?" These don't alter the run directly, but they prime your judgment for when the result arrives. Instead of reloading the entire problem from cold, you've been quietly sharpening your criteria. UX Tigers

While the agent works

If the agent could only get one thing right, what should it be?

What's the failure mode you're most worried about?

What would "good enough to ship today" look like?

These don't alter the run. They prime your judgment for when the result arrives.


"Since You Were Waiting…" Anticipatory Context. The system can also do its own branch work while the agent runs—pre-fetching docs or dashboards probably relevant to the answer, or pre-computing simple comparisons ("how this recommendation differs from last time"). By the time the main result appears, the supporting context is already staged. Your first 30 seconds after the wait are spent making sense of the result, not hunting for links. LinkedIn

Pre-fetched while you waited

Q1 2026 market share report ready
Last competitor analysis (Nov 2025) ready
How this differs from the Nov 2025 run computed

Your first 30 seconds after the result will be spent making sense of it, not hunting for context.


Clean Exits: Giving You Permission to Leave. Finally, you can design the right to walk away. After roughly 10–15 seconds, the UI can quietly say: "This will likely take a few minutes. You don't have to stay here—we'll notify you when it needs you." A one-click "Park this in my queue" action moves the run into a jobs/inbox view and returns you to your previous context. On return, instead of dumping you into a log, show a compact "Since you were away…" recap plus the key decision it's waiting on. This leans into what Nielsen and others are calling the cognitive latency stack: people will switch tasks after the first few seconds; the UX job is to make re-entry smooth and guilt-free. Nielsen Norman GroupJakob Nielsen

While running · ~3 min

This will take a few minutes. You don't have to stay here — we'll notify you when it needs you.

Since you were away · 6 min ago

Analyzed 47 competitor pages. Draft ready. One decision needed before it can finish.

Include pricing data from Wayback Machine snapshots?

The point of all of this isn't to fill every second with UI tricks. It's to stop treating your attention as free filler between agent steps. A 2–10 minute run is an opportunity: either the product lets that time collapse into anxious babysitting, or it turns it into branch work, better specs, and better judgment—without asking you to be the progress bar.


Checkpoint What Matters, Trust the Rest

Approval gates only around truly high-stakes actions. Great logs, diffs, and rollback so humans can trust the system without staring at every step. Research shows that adding online guardrails can increase end-to-end latency, so the design principle becomes: checkpoint at consequential decision points, not at every step. Redis

On mobile, where the device is conceptually "busy" during agent execution and there's nowhere else to go, you most need clear "this is now in the background, safe to ignore" states, strong completion / needs-input notifications, and very small, legible status summaries in the notification shade—not just in-app spinners. OpenForge

If you are going to support parallel agents, you can't just say "open more tabs." You need UX that acknowledges the merge and attention problem. Three concrete patterns would actually help:

Explicit fork-merge flows: when a user wants to explore multiple branches, treat it as a designed flow with visible assumptions, a dedicated merge screen with conflicts surfaced, and explicit options for resolution. Research across multi-agent systems concludes the real failure mode is at merge time, not fork time. Flowhunt

Merge: Competitive analysis 2 conflicts
Branch A
Enterprise focus
Branch B
SMB + mid-market
Conflict 1 · Target audience
Fortune 500, >$10M IT budget
Growing companies, <500 employees
Conflict 2 · Pricing model
Annual contracts, $50k+
Monthly SaaS, freemium entry

Product Scorecard: Who's Doing This Well?

Here's how major AI tools handle the waiting problem today, scored across the key dimensions discussed:

Product Expectation Setting Background Jobs Partial Results Re-entry Design Parallel Agents Grade
ChatGPT (chat) ⚠️ Spinner + streaming, no time estimate ❌ Must keep tab open ✅ Streams tokens ⚠️ Returns to conversation, no summary card ⚠️ Multiple tabs, no status board C+
ChatGPT Agent Mode ⚠️ Live browser, no time bounds ⚠️ Weak — runs in tab, easy to lose ✅ Shows actions live ❌ Dumps into log history ❌ Single-agent only C
Claude (chat) ⚠️ Reasoning bullets help, no duration signal ❌ No background jobs ✅ Streams tokens, shows reasoning ⚠️ Returns to conversation ⚠️ Multiple chats, no unified view B−
Claude (Projects) ⚠️ Shows file ops, no time frame ❌ Must stay on page ✅ Progressive file diffs ⚠️ Shows final result, minimal re-entry ⚠️ Single-thread focus B
Cursor (Composer) ⚠️ Shows steps, no time estimate ⚠️ Runs in background within IDE but blocks composer ✅ File-by-file progress ⚠️ Returns to composer with summary ✅ Multiple instances, shows status B+
GitHub Copilot Workspace ✅ Shows task plan before execution ✅ True background — can navigate away ✅ Progressive file updates ✅ Summary card on completion ❌ Single-task model B+
Replit Agent ⚠️ Shows plan, no time bounds ⚠️ Keeps window focus ✅ Shows build/deploy steps ⚠️ Minimal re-entry ❌ Single-agent B−

Key observations: No one is great at expectation-setting — even tools that show step-by-step progress rarely say "this will take ~2 minutes" upfront. Background jobs are rare; most tools assume you're watching. Partial results are common but shallow — streaming tokens helps short tasks, but long multi-step workflows still collapse into "running..." states. Re-entry is universally weak: you get logs, not a designed "here's what happened, here's what needs you now" card. Parallel agent support is mostly accidental — tools let you open multiple tabs but provide no status board or merge assistance.

What would an A+ product look like? Before execution: "This will take 3–6 minutes. Steps: research, draft, format. You can leave — we'll notify you." During: option to queue follow-ups, adjust constraints, see rough outlines as they form. Background: moves to jobs list automatically after 60 seconds, sends push notification on completion. Re-entry: summary card with decisions needed, not raw logs. Parallel: unified status board with cost visibility and merge conflict detection. No current product scores above B+. The design patterns exist; no one has shipped them all together yet.


The Real Problem

Underneath all of this is one simple principle: The agent's time is cheap. Your attention is not.

Most current tools invert that. They optimize for the model and treat your attention as the infinite, free resource in the loop. The UX opportunity is to make the transition from "AI copilot" to "AI agent" feel less like being on call for a system that might need you at any moment, and more like working with a collaborator that respects your attention, communicates proactively, and lets you disengage when the work doesn't need you. YouTube

What this research collectively describes is a new structural condition in knowledge work: the human worker is now inside the agent's latency envelope. Not just waiting for a tool to load, but embedded in a workflow that has its own pace, its own re-entry points, its own cost structure, and its own demands on attention that don't align with how human cognition actually works.

The products being built today mostly don't design for this. They design for the agent's capabilities, not for the human's cognitive budget. The result is a class of experiences that is genuinely more powerful than what existed before—and genuinely more exhausting to sustain. Oscillation fatigue and AI burnout are not user failures. They are design failures. ExplainX

The tools and patterns exist. What's missing is the will to treat the human's cognitive experience as a first-class design constraint—not an afterthought to the model's capabilities.

If you're building AI tools: stop optimizing for the model. Start designing for the human stuck waiting on it.