From Chat Windows to Interaction Systems

The conversational AI interface is no longer just a chat window.


Products shipping today use a rich set of UI component patterns: suggestion chips, inline edits, reasoning displays, confidence indicators, background agents, and even AI-generated interfaces. Conversation has become an interaction system.


At the same time, conversational UX has matured into a multi-billion dollar market. But the core lesson from both research and production deployments is not "make it sound human." The products that win are goal-driven, context-aware, and transparently bounded. They reduce cognitive load. They recover gracefully from failure. They make autonomy a design decision, not a technical accident.


This guide synthesizes:


The focus is practical: what to ship, why it works, and where it breaks.


Before diving into patterns, we need to anchor the behavioral foundations that make conversational interfaces succeed, or fail.


Foundational Design Principles


Grice's Cooperative Principle as a Design Framework

The most durable theoretical foundation for conversational UX comes from Paul Grice's Cooperative Principle (1975), which Google recommends as a basis for conversation design. The principle holds that effective communication relies on implicit cooperation between participants, governed by four maxims:

Maxim Principle Conversational UX Application
Quantity Give as much information as needed, no more, no less Avoid over-explanation; stream long responses; ask one relevant question at a time
Quality Be truthful; don't say what you lack evidence for Surface confidence; provide citations; say "I don't know" when uncertain
Relation Be relevant to the current topic Maintain context; avoid resetting mid-conversation
Manner Be clear, brief, and orderly Structure outputs for scannability; avoid jargon and verbosity

Many conversational UX failures are not really technical, but rather violations of these maxims. Overly verbose outputs violate Quantity. Hallucinated certainty violates Quality. Losing context mid-thread violates Relation. Dense, unstructured paragraphs violate Manner. chatbot

A 2025 participatory design study found that violations of the Relation maxim, breaking topic continuity, were especially damaging to trust. Users forgive latency and even minor errors. They rarely forgive losing context. experts.umn


These principles quietly underpin most effective AI interface decisions. Suggestion chips reduce cognitive load (Quantity). Citations and confidence indicators reinforce Quality. Memory systems preserve Relation. Structured outputs support Manner. developers.google arxiv uxmag

Building a cooperative structure is critical for a good conversational user experience.


When Conversational UI Works, and When It Doesn't

Conversational UI does not replace GUI. Each excels in different contexts.

Graphical interfaces support exploration: browsing products, comparing dashboards, editing layouts, scanning large datasets. Conversational interfaces support completion: when users have a clear goal and want to reach it quickly.


Conversational UI works best when users:


It breaks down when tasks require visual comparison, precision editing across many variables, or scanning complex structured information.

The strongest products don't choose one modality. They layer conversation inside graphical systems strategically, which leads to the structural frames that house AI features.


The Structural Layer: Three Interface Frames

Before diving into specific components, it helps to understand the three structural frames that products use to house AI capabilities. Microsoft's Copilot UX guidance codifies these as Immersive, Assistive, and Embedded. learn.microsoft

The strongest products don't pick one, they layer all three.

Cursor is a clear example:



Frame What it looks like Best for Examples
Immersive AI owns the full canvas, dedicated dashboards or workspaces Deep analysis, creative generation, research Perplexity, ChatGPT Canvas, Gemini Dynamic View
Assistive Sidecar panel alongside an existing app Ongoing support without context-switching GitHub Copilot Chat, Intercom Fin, Notion AI Chat
Embedded AI woven directly into existing UI elements Frequent, low-friction actions users barely notice as AI Grammarly inline suggestions, Notion Autofill, Cursor Tab, Linear Triage

GitHub Copilot similarly blends Ask mode (assistive), Edit mode (embedded), and Agent mode (immersive). The important design insight: interface frame is a product decision, separate from model capability.

A highly capable model can still operate in an embedded pattern (L1-style assistance), while a narrower system can feel immersive if given control of the full canvas. Autonomy is expressed through interface structure.

With these frames established, we can now move from architecture to components, the specific patterns that make conversational systems usable in practice.


Component Patterns

With the structural frames established, we can move to the component layer and the specific interaction patterns that make conversational systems usable in practice.


Rather than listing patterns randomly, it's more useful to group them by function:


A. Wayfinders: Getting Users Started

The blank input field is one of the hardest UX moments in conversational AI. Users don't know what to type, how detailed to be, or what the system can actually do.


Suggestion chips are the most widely adopted solution. Small, tappable prompts appear before the user types anything, reducing the cognitive load of starting from scratch while simultaneously teaching the system's capabilities. onething

ChatGPT's home screen shows topic-based suggestions ("Help me write," "Analyze data," "Brainstorm ideas"). Perplexity surfaces trending questions. Intercom Fin presents task-specific chips like "Track my order" or "Return an item," tailored to common support requests.


The pattern works because it satisfies the Quantity maxim: enough guidance to start, without overwhelming the user with documentation.


bot

I've processed your return request. Your refund will be processed within 5-7 business days. Is there anything else I can help you with?

Quick replies also function mid-conversation as contextual next actions. After resolving a support request, options like "Was this helpful?" or "I need more help" keep the flow moving without requiring users to compose the next prompt.


Guided Conversation and Slot Filling

Where suggestion chips reduce startup friction, guided conversation structures task completion.

The guided conversation pattern leads users step-by-step toward a defined outcome by asking one relevant question at a time. Domino's ordering flow exemplifies this: users customize a pizza through structured prompts rather than freeform typing.

Try it: Tap the chips to make selections. Notice how each choice populates structured slots in the background. The conversation feels natural, but the system is building a complete order form behind the scenes.


The closely related slot-filling pattern collects required information conversationally instead of presenting a long form. Bank of America's Erica gathers structured inputs in the background while maintaining a chat-like interface. gapsystudio


Users experience conversation. The system is populating structured fields.


These patterns work particularly well when users know their goal but not the exact navigation path to reach it, reinforcing the earlier distinction between completion and exploration.


Templates and Prompt Builders

For more complex tasks, templates go further than chips. They provide structured, fillable formats that reduce the cognitive burden of "prompt engineering."

Verbose
Professional
Verbose
Professional
Verbose
Casual
Concise
Professional
Concise
Casual
Casual
Concise

Click anywhere in the grid to select a tone

Notion AI's slash commands (/summarize, /translate, /rewrite) are templates disguised as actions. Each represents a pre-structured prompt with clear output expectations. Users select intent rather than composing instructions from scratch. zapier

This is intent mapping expressed in interface form: design around goals, not sample dialogue.


B. Prompt Actions and Inline Workflows

Once users are oriented, the next layer concerns how they act. The most momentum today is around inline actions, applying AI operations directly to selected content instead of describing changes in a separate chat box.


Grammarly pioneered this pattern. Underlines appear beneath text with suggested corrections, and users accept or reject with a click. When introducing "accept multiple suggestions," Grammarly found that mechanical corrections (grammar, spelling) were accepted at high rates, while tone and style suggestions required more scrutiny. The redesign bundled only high-confidence suggestions and introduced preview panels with per-suggestion revert and full undo. Activation improved significantly after introducing explicit preview and reversibility. grammarly

Grammar
Style
Tone
4 suggestions
The product launch was a very good success. The team was happy about their work, and this is a important achievement. Moving forward, we should leverage these learnings.

The core principle is consistent across products: preview before commit.


ChatGPT Canvas extends this to writing and coding. Users highlight text, invoke contextual options ("suggest edits," "adjust length," "change reading level"), and review changes in a diff view before accepting. Figma Make scopes AI actions to selected areas of the design canvas. GitHub Copilot's inline commands (/explain, /fix) focus the model on selected code.

Inline actions preserve authorship. They reduce prompt friction. And they reinforce Quality and Manner by making changes visible and reversible.


Auto-Fill and Embedded Intelligence

Notion's database auto-fill is the cleanest embedded AI pattern shipping today.

When a new entry is created in a Notion database, AI properties automatically populate, summaries, key info, translations, or custom fields driven by user-defined prompts. The user doesn't open a chat window. The intelligence is embedded into the data structure itself. notion


Name
Summary
Category
Priority
Redesign onboarding flow
Simplify first-run UX to reduce drop-off
UX
High
Fix API rate limiting
Add throttle logic to prevent 429 errors
Eng
High
+ New

Regenerate and Variations

The regenerate button is now standard, but a more advanced pattern is variations, presenting multiple outputs simultaneously.

Midjourney's four-image grid per prompt is a canonical example. In text, Claude allows branching conversations so users can explore alternative responses without losing the original thread.

Variations reduce the cognitive burden of iterative prompting and shift the task from "describe the perfect output" to "select the best candidate."

Prompt: Write a subject line for a product launch email

Feedback and Loading: While the AI Works

AI systems take time to respond. How that time is handled dramatically affects perceived quality.


Streaming Text

Token-by-token rendering has become standard for LLM responses. Text appears progressively rather than after a spinner.

This does more than mimic typing. It gives users something to read immediately and dramatically reduces perceived wait time compared to a spinner followed by a wall of text. The Vercel AI SDK provides production-ready components for streaming markdown, tool execution displays, and reasoning blocks. dev


Skeleton and Shimmer States

For non-streaming content (cards, dashboards, structured outputs), skeleton screens outperform spinners at reducing perceived wait time. The skeleton matches the shape of the eventual content with grey placeholder blocks that pulse or shimmer. Facebook, LinkedIn, and Uber all use this pattern extensively. In AI products, this is increasingly used for structured outputs like comparison tables or data cards that can't be streamed token by token. blog.logrocket


Reasoning Displays

As models incorporate chain-of-thought reasoning, products vary in how much of that reasoning they reveal.

Product Default visibility Structure Key UX mechanic
ChatGPT Short labels visible, collapses when done Minimal Flashing text labels signal progress
Claude Hidden by default, expandable Bullets, separately scrollable Animated icon + time counter
Grok Scrolling snippets during, collapses after Detailed but unstructured Time counter + clear expand guidance
DeepSeek Always visible, continuous generation Highly detailed, no structure Progressive scrolling
Gemini Visible, user-controlled scrolling Bullets and numbers User controls pace

The key insight: more transparency does not automatically equal better UX.

Claude's approach, minimal by default, expandable on demand, respects the user's primary goal (getting the answer) while making reasoning available for users who want to verify. The report calls this the "elevator mirror effect": well-designed progress indicators reduce perceived wait time regardless of whether users actually read them. digestibleux

With these feedback patterns established, the next critical layer concerns control, ensuring that as systems become more capable, humans remain meaningfully in charge.


Governors: Keeping Humans in Control

As AI systems move from suggestion to execution, control mechanisms become the defining UX layer. The difference between a helpful assistant and a risky one is rarely capability, it's governance.


Action Plans

Before executing complex tasks, the AI presents a plan of intended steps and waits for approval.

GitHub Copilot's Agent mode shows which files it will modify and which terminal commands it intends to run before execution. Riskier commands receive additional confirmation. Cursor's "Plan mode" similarly outlines steps before acting. skywork

This pattern becomes essential as AI moves from suggestion to execution. The principle: the higher the stakes, the more explicit the approval gate.

This pattern becomes essential as autonomy increases. It transforms opaque execution into inspectable intent.


Verification and Undo

Undo is a trust requirement.

Grammarly's early "accept all suggestions" design underperformed because users feared irreversible changes. The revised version introduced a preview panel, granular revert controls, and full undo. Activation improved once users could see and reverse every modification.

Linear's AI triage follows the same pattern. Suggested assignees and labels are clearly marked, one-click reversible, and optional.

If the AI changes something, the user must be able to revert it.


Citations and Source References

For factual outputs, citations have become a baseline expectation and an industry standard.

Perplexity structures responses around clickable, numbered sources. AWS's Cloudscape design system includes a dedicated citation popover component for generative chat, showing source documents and excerpts inline.

Citations transform AI from a black-box oracle into a research assistant whose work can be verified.

This directly reinforces the Quality maxim.


Memory and Personalization Architecture

Persistent memory increases usefulness, but it also raises trust risk.

ChatGPT's memory system recalls preferences and contextual details across sessions. Critically, users can view, delete, or disable stored memories, and can switch to a "Temporary Chat" mode outside memory. openai doneforyou


Memory Tier Scope Example Precedence
Global Long-term defaults "Usually prefers aisle seats" Lowest
Session Current interaction "Window seat this time for the red-eye" Higher
Current message Real-time input "Actually, make it aisle after all" Highest

The governing principle: memory should feel like a tool the user controls, not surveillance they endure.

With governance patterns in place, we can examine the broader trust layer that sits across all components.


Memory Type Best Use Case Trade-off
Conversation Buffer Short interactions needing full context High token usage
Summary Memory Longer conversations needing general context May miss fine details
Buffer Window Retaining recent exchanges Quick but limited scope
Summary Buffer Multi-session interactions Balances detail and performance

Trust, Transparency, and Error Recovery

Trust calibration, helping users know when to rely on AI and when to verify, remains one of the most under-designed areas in conversational UX. Research consistently shows that trust is "sticky." Early impressions anchor perception, even as system performance changes.


Trust in conversational systems typically rests on four pillars:

Interface design determines whether those pillars feel solid or fragile.


Confidence Indicators

The pattern is straightforward: show how certain the AI is about its output.

This can be a percentage, a color code (green/yellow/red), or natural language hedging ("This may suggest..." vs. "The answer is..."). The challenge is calibration, if confidence scores don't match actual accuracy, they do more harm than good.

Grammarly addresses this implicitly by categorizing suggestions: mechanical corrections (spelling, grammar) are high-confidence and often bundled; tone and style suggestions are presented individually for user evaluation.

In practice, calibration often matters more than raw accuracy. Users anchor on early experiences and are slow to update trust, even as system performance changes.


Error Recovery

No conversational system handles every input correctly. The difference between abandonment and recovery lies in how failure is handled.


Effective fallback responses:


Research shows that empathy-oriented recovery (acknowledging user frustration before offering a solution) increases perceived warmth and post-error satisfaction. Notably, humor in problem-solving contexts often backfires.

Industry data suggests that a large percentage of failed conversations can be recovered through thoughtful fallback design. The key is forward momentum.


Disclosure and AI Signaling

Clear disclosure that AI is involved serves both ethical and functional purposes.

Google's sparkle icon research suggests users recognize it as a signal for AI features. However, the icon alone does not communicate the type of AI or its reliability. When every feature sparkles, the symbol loses meaning. design

The practical rule: use AI indicators to signal involvement, but pair them with explicit action labels "Summarize with AI," "AI-suggested edit" rather than relying on iconography alone.

Trust is reinforced not by novelty signals, but by predictable behavior.With trust mechanisms defined, the next layer addresses one of the most complex system-level design challenges: memory and personalization.


The Autonomy Spectrum

Up to this point, we've examined structure, components, trust mechanisms, and memory. Now we zoom out. Conversational systems don't exist in a binary state of "manual" or "fully autonomous." They operate along a gradient.


A useful framework defines five levels of autonomy: knightcolumbia


Level User Role Description Example
L1 Operator User drives every action; AI suggests Grammarly inline corrections
L2 Collaborator Frequent back-and-forth; AI proposes, user refines ChatGPT Canvas, Cursor Cmd+K
L3 Consultant AI takes initiative; user provides feedback when prompted GitHub Copilot Agent mode
L4 Approver AI executes autonomously; user reviews and approves Cursor Background Agents
L5 Observer AI acts independently; user monitors outcomes Fully agentic commerce flows

The critical insight: Autonomy is a design decision, separate from capability.


A highly capable model can be constrained to L1. A modest model can feel immersive if given execution authority. More autonomy does not automatically mean better UX. The right level depends on:


Designing autonomy is about calibrating control, not maximizing automation.


Progressive Autonomy

Effective systems do not expose maximum autonomy immediately. Canva's approach illustrates progressive disclosure applied to AI:


This prevents cognitive overload and preserves trust. Progressive autonomy mirrors how users build confidence. The system earns the right to act more independently.


From Reactive to Proactive

Most early conversational systems were reactive: user asks, AI responds.

Anticipatory personalization uses AI to predict customer needs and act before they are expressed. Unlike recommendation engines that suggest items based on past behavior, anticipatory systems predict what a customer will need next and initiate action, an offer, a notification, a routing decision, without waiting for the customer to ask.


The three core pillars map directly to measurable outcomes:

Pillar Mechanism Outcome
Autonomous decision-making AI decides within defined risk thresholds Lower handle time, fewer escalations
Hyper-contextual personalization Combines behavior, sentiment, history in real time Higher relevance, engagement, CSAT
Anticipatory service design Predicts and prevents friction before it appears Reduced churn, fewer contacts

Agentic AI makes this operational at scale by monitoring signal streams continuously, inferring likely needs through predictive models, and initiating actions within organizational guardrails.


Examples across the spectrum:


Practical Implementation

Lazarev.agency's framework for anticipatory design involves three steps:

  1. Map friction points: Identify where users hesitate, drop off, or second-guess
  2. Explore pre-action opportunities: At each friction point, ask "Can the product step in here proactively?"
  3. Design with control: Opt-in for major proactive features; provide indicators for AI-driven suggestions; allow feedback (thumbs up/down)

Success metrics include time saved, conversion lift, error reduction, and user satisfaction ratings. Qualitative feedback is equally important, asking users whether features felt helpful or intrusive reveals whether the system is anticipating correctly or being presumptuous.


Multimodal Conversational Interfaces

Modern conversational AI increasingly blends voice, vision, and typed inputs. The design challenge is creating fluid transitions between modalities, a user may start by typing, continue with speech, and finish with visual confirmation.


Measurement Framework


Core Metrics

Conversational UI success must be measured beyond engagement, focusing on business impact, efficiency, and user outcomes:

Metric What It Measures Insight
Containment Rate Conversations fully handled by bot Cost efficiency (but only if resolution quality is strong)
Completion Rate Users finishing a started task Direct business outcome
Drop-off Rate Where users abandon Identifies friction points
Conversation Success Rate Intent actually resolved Strongest indicator of real value
CSAT Post-interaction satisfaction High containment + low CSAT = red flag
Repeat Inquiry Rate Same issue raised again Measures true resolution
Cost per Interaction Bot vs. human cost ROI calculation

A critical insight: high containment rate with low CSAT indicates automation is resolving tickets but not satisfying users, a common failure mode.


Conversation Length

A CHI 2024 study using GPT-4-powered Slack chatbots tested conversation turns of 0, 3, 5, and 7. Quality did not differ drastically across conditions, but participants had mixed reactions. The finding suggests that adaptive conversation length, calibrated to user need and question type, is more effective than a fixed strategy.


Agentic Patterns Shipping Now

The shift from copilots (AI suggests, human acts) to agents (AI acts, human oversees) is already happening in production systems.


Background Agents

Cursor's background agents spin up sandboxed environments to complete tasks while the user continues working. Multiple agents can operate in parallel on different problems. The UX challenge is not execution, it is visibility. skywork


Cursor addresses this through:

The governing model remains L4: autonomous execution with human approval and review.


Agentic Commerce

AI personalization boosts e-commerce revenue by up to 40%, with product recommendation conversion rates increasing 15-20%. Meanwhile, 91% of consumers are more likely to shop with brands providing personalized recommendations. Nona Coffee automated 80% of support tickets using conversational AI, achieving a 12X ROI.

Shopify's Universal Commerce Protocol enables AI agents to discover products, build carts, render checkout, and place orders inside AI interfaces. Checkout occurs within conversational environments, no redirect required.

This compresses discovery, evaluation, and transaction into a single interaction surface.


Conversational Shopping

Amazon's Rufus goes beyond Q&A. It supports: aws.amazon

Here, memory and anticipation intersect with commerce. The system recalls purchase history and contextual details to reduce multi-step workflows to a single conversational turn.


Resolution-Based Support

Intercom's Fin operates through multi-step "Procedures" defined in natural language. It handles complex support scenarios and escalates to humans when necessary, transferring full context. myaskai

The measurable gains, response speed, containment, cost reduction, are significant. But as history shows, containment alone is not the success metric.


Notable Platform Teardowns

Analysis of 33 chatbot UIs across industries reveals consistent patterns in successful designs:

Product Category Key Design Insight
ChatGPT Productivity Monochrome minimalism; suggested prompts reduce blank-page anxiety
Claude Productivity Humility ("can make mistakes") as trust mechanism
Notion AI Productivity Embedded in workflow, eliminates context-switching
Bank of America Erica Finance Calm predictability; contextual mini-charts within chat
Cleo Finance Humor transforms spending guilt into motivation
Replika Companion Memory recall ("You mentioned feeling better") drives attachment
Drift B2B Sales Dynamic branching; progress indicators; in-chat scheduling
Pi (Inflection) Companion Brevity as empathy; warm gradients; typing animation mimics hesitation
Slackbot Productivity Invisible assistance; highest usability = forgettability

The most successful chatbot UIs share a common trait: they do not try to sound human. Instead, they predict instead of wait, guide instead of interrupt, and when tone and context align, the conversation flows naturally.


Patent Landscape

Recent patent filings reveal the technological direction of conversational UX:

Patent Assignee Innovation
US11687802B2 Walmart Proactive user intent prediction in personal agents using contextual data and predictive models
US12248518B2 PayPal Free-form, automatically-generated conversational GUIs that adapt dialogue flows dynamically based on user intent
US11243991B2 IBM Contextual help recommendations based on conversational context and interaction patterns, generating suggestions when confidence falls below threshold
US11580968B1 Amazon Contextual NLU for multi-turn dialog using attention layers and memory encoders for intent classification
US12229511B2 IBM Auto-generated question suggestions using intent-entity prediction models trained on conversation history
US20260017305A1 - Context-preserving pinning with AI-driven retrieval in conversational interfaces

These patents converge on three themes: proactive intent prediction (acting before the user asks), contextual memory across turns (maintaining coherent dialogue), and dynamic interface generation (adapting the UI to the conversation state).


The Klarna Cautionary Tale

No guide to conversational UX is complete without the warning. Klarna went all-in on AI customer support, replacing human agents and reporting massive cost savings. Then in mid-2025, they reversed course and began rehiring humans. businessinsider

The CEO's explanation: "AI provides speed, human talent offers empathy. Together: prompt when necessary, compassionate when required".

The deeper UX problem was that circular conversations, where the AI couldn't resolve the issue and the user eventually gave up, were being counted as "resolved". High containment rates masked low actual resolution quality.

The lesson for conversational UX design: measure outcomes, not containment. Build escalation paths that are easy to find, not buried. And treat "I don't know, let me connect you to a person" as a feature, not a failure.


Conclusion

Good conversational UX is about clarity and control.

Products win when they guide users, stay relevant, admit uncertainty, and make changes reversible. As AI becomes more autonomous, governance matters more, show plans before acting, allow approval, and make escalation easy.

Conversation works best for completing goals, not replacing every interface. The strongest products blend chat with graphical systems instead of forcing one or the other.

The future is more proactive and agentic. But autonomy only works when users remain in charge, and when outcomes, not automation rates, define success.