UI got better. UX got worse.

The Polish Problem

In August 2025, Adam Wathan posted an apology that got a million views. "I'd like to formally apologize for making every button in Tailwind UI bg-indigo-500 five years ago, leading to every AI-generated UI on earth also being indigo."

Adam Wathan @adamwathan · Aug 7, 2025

I'd like to formally apologize for making every button in Tailwind UI bg-indigo-500 five years ago, leading to every AI generated UI on earth also being indigo.

View post →

He'd chosen indigo as a default demo color, a neutral placeholder that worked in examples. Tailwind became the dominant web framework, indigo became the default across thousands of tutorials and templates, those templates were scraped into LLM training data, and the models learned a statistical truth: purple buttons are what web buttons look like.

The result is a now-recognizable pattern users call "AI slop": purple gradients, Inter font, rounded cards with subtle shadows, three-column feature grid with icons, hero section with gradient text. Open three AI-generated landing pages at random and at least two will share this visual DNA. Users are developing pattern recognition for it, the same way they learned to recognize a 2015 WordPress theme.

This is structural evidence of how AI-generated design works: statistical pattern replication, not intentional decisions. The training data was biased toward a default nobody chose, and now the internet is saturated with its output. AI doesn't have taste. It has probability distributions.

The indigo button is easy to laugh at. What it points to is harder to dismiss: a period where visual polish became decoupled from UX quality at scale — and teams, organizations, and entire product cultures haven't caught up to what that actually costs.

What AI can actually do now

By mid-2025, AI design tools had moved from experimental to production-grade for certain tasks. Vercel's v0, Figma Make, Cursor, Lovable, and similar tools can generate complete interfaces with interactive, code-ready screens from prompts. Google's Gemini introduced "Generative UI" that produces real interfaces rendered on demand. In Google's own research, human-designed interfaces only outperformed AI-generated ones 56% of the time. standardbeagle

That margin caught the industry off guard. For teams under shipping pressure, a tool that produces visually credible results more than half the time against expert human designers sounds like a viable path to skipping the design process entirely.

It is not. The 56% figure measures visual and surface-level credibility — not usability, not accessibility, not retention. And structured evaluations tell a different story: AI-generated interfaces matched human expert-designed work only 44% of the time when assessed on actual UX quality. designwhine Impressive for something generated in seconds, but the 56% miss rate is where users live.

UXMatters' April 2026 analysis makes the framing precise: "AI Tools Are Replacing Judgment, Not Labor". Friction — the uncomfortable wrestling with "what should this do?" — is where design judgment forms. AI removes that friction. When it does, designers shift from authors to curators, reacting to outputs instead of deciding from first principles. Over time, that erodes the judgment that distinguishes good UX from design that is merely acceptable. uxmatters

The accessibility collapse

The clearest evidence that AI-generated interfaces are failing in ways screenshots don't reveal is accessibility data.

Microsoft's evaluations of AI design tools found that under typical prompting conditions — the way most teams actually use these tools — the best-performing model passed WCAG accessibility standards only about one-third of the time. Most models failed far more often. The failures are not edge cases: inaccessible navigation, broken keyboard support, incorrect ARIA roles, missing form labels. These are barriers users with disabilities encounter immediately — and silently leave over. standardbeagle

The macro picture is starker. WebAIM's 2026 analysis of the top one million home pages found 95.9% contained detectable WCAG 2 failures — an increase from 94.8% in 2025, reversing a six-year trend of gradual improvement. Average detected errors per page rose 10.1% to 56.1. The most common failures: low contrast text (83.9% of pages), missing alt text (53.1%), missing form input labels (51%). webaim

Accessibility had been getting marginally better every year. Then AI-generated interfaces went mainstream — and the trend inverted.

The reversal of the six-year improvement trend is the signal worth pausing on. The screens look better. The experience is harder to use.

Who is actually designing now

AI tools have made it possible for a PM to generate a layout, a developer to ship it, and a product to reach users, without a designer involved at any stage. This is not hypothetical. Figma's 2025 AI report found that developers were nearly twice as likely to use AI for core design work as trained designers were. The people with formal training in usability and accessibility were the most skeptical. The people without that training were the most enthusiastic adopters. standardbeagle

UserTesting's 2026 State of Design panel explicitly flagged "false confidence" as the number one team-level risk of AI-assisted design. The mechanism is specific: AI tools present outputs with visual authority — they look like finished products. Non-designers, who cannot distinguish between a polished layout and a well-considered interaction design, treat the visual credibility as evidence of UX quality. It is not. usertesting

Vibe coding tools — systems that let users generate and deploy interfaces from conversational prompts — further accelerate this by dedicating only approximately 20% of their processing attention to design aspects in the generated output. The remaining 80% goes to functional capability. The result is interfaces that work in demos and break under real-world use. reddit

The Slop Trap Mechanism

Standard Beagle, a UX consultancy tracking AI-generated interface quality, has named the pattern "The Slop Trap":

AI makes interface creation fast and cheap
Teams under pressure ship what looks "good enough"
UX review gets skipped
Users encounter friction — especially users with accessibility needs, older devices, or imperfect conditions
They leave
The product underperforms
The failure gets blamed on the market instead of the experience

The trap is effective because AI-generated UX looks professional. It borrows the visual language of mature products. The cracks don't show up until real users arrive — and by then, the organizational assumption is that the design is fine. standardbeagle

A further aggravation: most AI design tools are built by teams with little UX representation, judged primarily on speed and capability, not on whether their outputs support real human behavior. The tools themselves are not designed to surface usability failures.

Why the Trained Designer Is the Most Skeptical Person in the Room

The Figma data point deserves its own analysis: developers were nearly twice as likely to use AI for core design work compared to trained designers. This is not a coincidence. Designers who have spent years learning to distinguish visual quality from UX quality are the ones most aware of how much the tools miss.

The design community's resistance to uncritical AI adoption is not technophobia. It is professional judgment recognizing a category error — confusing the output layer (pixels) for the outcome layer (user behavior).

Pretty UI is not good UX

A useful way to frame the gap: UI is a layer, and UX is a system. AI is excellent at generating the visual layer. It consistently skips the layers that determine whether a product works.

The layers AI generates reliably: visual hierarchy and layout patterns, component styling and color systems, typography and spacing, icon libraries and illustration styles.

The layers AI consistently fails to address: user flows and navigation logic across screens (AI generates screens in isolation), edge cases, empty states, error recovery, loading behavior, mobile responsiveness unless explicitly prompted.

The Average User Problem

AI-generated interfaces are designed for nobody specific, which means they work for the statistical average and fail sharply at the edges. A UX researcher on Reddit's UserExperienceDesign community articulated the issue precisely:

"AI-generated interfaces assume everyone uses products the same way. They target an imagined 'average user' rather than reflecting real user diversity. They can produce a spotless interface that works well for a power user who already knows what to do, yet it becomes bewildering for a newcomer." reddit

AI replicates patterns from training data without understanding why those patterns exist. A navigation pattern that works for a B2B SaaS dashboard gets applied to a consumer wellness app because the visual pattern is similar. The context, user mental models, task frequency, error tolerance, is invisible to the generative system.

This is the reason Microsoft found AI passing accessibility standards only a third of the time under typical conditions. Accessibility is not a visual decision — it is a behavioral and structural decision. AI cannot make it from visual patterns alone.

The Homogeneity of Optimizing for the Average

Jeff Gothelf, a veteran product leader, observed that the current wave of AI feature launches looks structurally identical to every previous wave of technology adoption: companies "implementing AI" as a competitive badge — rushed, not planned with the customer in mind, designed to put an "AI badge on marketing copy." The result: "AI buttons everywhere with no rationale or meaningful workflow for their usage. They don't make the user more successful. They make the user experience worse." jeffgothelf

Products become, in the words of UXMatters, "smoother and less human at the same time" — optimized for averages, frictionless in ways that remove the texture that makes experiences feel considered.

The data that exposes the gap

The most commercially credible evidence that AI-polished products underperform is the 2026 RevenueCat State of Subscription Apps report — built on data from over 115,000 applications, $16 billion in revenue, and upward of a billion transactions. techcrunch

Metric	AI Apps	Non-AI Apps
Annual retention (12 months)	21.1%	30.7%
Monthly retention	6.1%	9.5%
Annual churn rate	30% faster	—
Refund rate (median)	4.2%	3.5%
Revenue per payer	+41% higher	—

The +41% revenue per payer figure is the tell. AI apps convert better — the hype drives initial purchases. Then users discover the experience doesn't deliver sustained value, and they leave — 30% faster than they leave non-AI apps. revenuecat

TechCrunch's reporting on the study identified the core dynamic: "AI hype can drive initial sales. It's not yet creating the lasting value needed for long-term retention." techcrunch

The Enterprise Picture Is Worse

RevenueCat's data covers consumer apps. The enterprise picture is starker. MIT's 2025 study, based on 150 executive interviews, a survey of 350 staff, and an examination of 300 public AI deployments, found that 95% of enterprise AI pilots delivered no measurable P&L impact. Only 5% of integrated systems created significant value. fortune

The core issue, per MIT's researchers, was not model quality. It was "flawed enterprise integration", generic tools shipped into workflows without the user research and UX work needed to make AI features actually change behavior. fortune

Amplitude, building AI products at scale for 4,500 enterprise customers, described the gap between AI demos and enterprise readiness as the defining product challenge of 2025: customers "want software they can use collaboratively and reliably," and the teams getting adoption right are "the ones building robust evals, meeting users where they already work" — not the ones chasing the fastest feature launches. amplitude

The Expert Designer AI Paradox

One of the most counterintuitive findings of 2025 was a peer-reviewed study by Hou et al. (2025) in Information Systems Research, testing designers at varying experience levels with and without generative AI tools:

During ideation, AI meaningfully improved creative output for designers at all experience levels — it broke cognitive fixation and opened new directions
During implementation, AI was actively counterproductive for experienced designers — they spent 57% more time on their work than peers without AI, with no improvement in output quality

The reason: senior designers have established working rhythms, and AI outputs clashed with those rhythms rather than complementing them. The tool helped designers who hadn't yet developed a process; it slowed designers who had.

A parallel finding from METR's randomized controlled trial (February–June 2025, 16 experienced developers, 246 real tasks): developers using AI tools took 19% longer to complete tasks. More revealing: they believed AI had sped them up by 20% while the data showed the opposite. metr The perception gap — feeling faster while being slower, is exactly the mechanism behind false confidence in AI design tools.

The Launch-to-Churn Pipeline

A 2026 analysis of 50 Product Hunt launches found that AI and design tool categories had the highest average upvotes — but design tools had the best Top 5 conversion rate. The implication: the market rewards designed products, not just AI products. uprowshub

Meanwhile, 90%+ of daily top five products on Product Hunt in 2026 include "AI" or "agents" in their taglines. The signal is degrading. Users have learned that the AI badge is not a quality indicator, it is a category marker. Products that can't hold attention beyond the initial experience are not retaining the users their launches attract.

The quiet failure dynamic Standard Beagle named is consistent across these datasets: "Users just don't come back."

What "designing" actually means now

The irreplaceable elements of product design are not abstract.

Framing the problem before solving it. AI can generate solutions from a brief, but it cannot decide whether the problem is worth solving, whether the feature should exist, or whether the brief is wrong. That judgment happens before any screen is drawn, and it is entirely absent from AI-generated workflows.

Designing for the deviation, not the average. AI targets statistical patterns from training data. Real products are used by domain experts, anxious first-time users, people with cognitive load, people on unreliable connections. The experience for these users, the ones who test the edges, is where trust is won or lost.

Knowing what to subtract. AI generates toward completeness, it tends to add rather than remove. The judgment that a feature is adding noise rather than value, and the willingness to pull it, requires understanding the whole product's coherence, not just the screen being generated.

Interpreting ambiguity under real conditions. AI tools present outputs as if uncertainty is already resolved, heatmaps without context, synthesized personas without lived experience, sentiment analyses without nuance. A designer who stops asking "does this make sense?" and starts asking "how do we ship faster?" is not being augmented. They are being substituted by their own workflow.

The Expanded Role of Design

Autodesk's 2025 AI Jobs Report, analyzing over 3 million job listings, found that design has overtaken technical expertise as the most in-demand skill in AI-related job postings, ahead of programming, cloud infrastructure, and data science combined. The demand for design is not falling because AI can generate interfaces. It is rising because AI-generated interfaces require someone with judgment to make them work.

Boldare's 2026 analysis of product design under AI pressure captured the structural shift: "When interface generation becomes fast and cheap, the scarce resource is no longer the ability to produce screens. It's the judgment to ask the right questions before producing anything." boldare

NNG's 2026 State of UX report frames the consequence of the current trajectory: teams spending AI's productivity gains on shipping more features faster are accumulating a user trust deficit that will surface in churn data eventually. The design discipline has always been the function in tech most insistent on asking "but should we?" rather than "can we?" That instinct is not less valuable in an AI-augmented workflow. It is the core competency the tooling cannot replicate.

The gap between a mid-level and senior designer is rarely tools. It's accumulated judgment from working through hard problems without a shortcut. AI makes the shortcut available earlier than it should be.

What's happening in hiring

When a production layer becomes cheap and abundant, the work upstream from it becomes scarcer. Deciding what to build before generating screens. Knowing which feature adds noise rather than value. Designing for the user who deviates from the average, the domain expert, the first-time user, the person under time pressure on a slow connection. These are the conditions where products are actually used, and the conditions AI-generated interfaces handle worst.

NNG's 2026 State of UX: "Lazy AI features and AI slop are now ubiquitous, and the shine is fading fast." designwhine

Users are calibrated differently now. They've been through enough polished-but-shallow experiences to apply skepticism early. The 41% conversion advantage AI products currently hold will compress as the aesthetic becomes associated with churn rather than quality.

The teams building durable products are using AI's speed to get to earlier questions, not to skip them. What problem is this solving and for whom? What does this look like at the edges of the happy path? What should be removed?

These questions predate AI tools. They're harder to automate than layout generation. Based on the hiring data, the retention data, and the emerging premium on human judgment in design — they're becoming the differentiator.

The purple button was always the wrong optimization target. It's just more visible now that everyone has one.