How Spotify Killed Vanity Metrics (And Built a $25B Business)

The discovery metric that predicted retention better than 50B monthly plays

Hey Warblers,

In 2015, Spotify had a problem that looked like success.

100 million users. 50 billion plays per month. Every chart trending up and to the right.

But when Apple Music launched that June, Spotify faced a surprising reality: heavy usage didn't predict loyalty. Users playing thousands of songs monthly were just as likely to switch as casual listeners.

The investigation revealed an uncomfortable truth: play count alone was a weak predictor of retention.

This discovery led to Discover Weekly and a complete philosophical overhaul of how Spotify—and now every smart product team—thinks about metrics.

The Paradigm Shift: From Consumption to Discovery

Spotify's revelation was profound yet simple: measuring consumption isn't the same as measuring value.

Think about it. A user playing the same 100 songs on repeat generates massive "engagement" numbers. They're inflating your MAU, your session length, your play count. By every traditional metric, they're a power user.

But they're not getting any new value from your product. They're using Spotify like a CD player with a really good shuffle function. When a competitor offers the same songs for cheaper (or free), why would they stay?

Spotify's own research on listening diversity (published by their R&D team) found that users with more varied listening patterns had significantly higher retention rates when controlling for overall activity. It wasn't how much you listened—it was how you listened.

By "discovery," Spotify meant users saving songs from artists they'd never played before, exploring new genres, and breaking out of listening loops. This wasn't just about new content—it was about expanding musical horizons.

The Three Paradigms of Product Metrics

While Spotify has publicly discussed their shift toward discovery metrics and listening diversity, the three-paradigm framework below synthesizes common product practices inspired by their approach:

Paradigm 1: Activity Metrics (What Pirate Metrics Measure) These measure what users do, not why they do it:

  • Plays, clicks, pages viewed

  • Time spent, sessions, DAU/MAU

  • Features used, buttons pressed

The fundamental issue: Activity can increase while value decreases. A confused user clicking everywhere generates great "engagement" metrics while having a terrible experience.

Paradigm 2: Success Metrics (What Users Actually Want) These measure whether users achieve their goals:

  • Problems solved, jobs completed

  • Aha moments reached, value realized

  • Success states achieved

The insight: Users don't want to use your product. They want to achieve something through your product.

Paradigm 3: Relationship Metrics (What Creates Long-term Value) These measure the depth of user connection:

  • Trust built, habits formed

  • Identity reinforcement, community connection

  • Switching costs (emotional, not just functional)

The reality: Retention isn't about usage. It's about becoming part of someone's life.

The Framework: From Vanity to Value

Instead of vanity metrics, leading product teams now use a three-layer approach (inspired by companies like Spotify who've moved beyond vanity metrics):

Layer 1: Input Metrics (Seeds You Plant Today)

These are leading indicators—actions you take that might create value. The key insight: inputs must be directly controllable by your team within one sprint.

Think of inputs like planting seeds. You can control:

  • How many seeds you plant

  • Where you plant them

  • How you water them

You cannot control:

  • Whether they grow

  • How fast they grow

  • What fruit they bear

For discovery-focused products, this might mean measuring how many opportunities for discovery you create—though the specific metrics vary by company and context.

Layer 2: Output Metrics (Fruits You Harvest Later)

These lag behind inputs by 2-4 weeks. They measure whether your seeds grew into something valuable. The critical requirement: clear causal logic from input to output.

The philosophical shift: Stop measuring everything that happens. Start measuring only what matters for your business model.

Spotify found strong correlations: users who regularly discovered new artists showed higher retention rates and were more likely to upgrade to premium, though many factors influence these outcomes.

Layer 3: Health Metrics (Soil Quality Over Time)

These prevent you from strip-mining your ecosystem for short-term gains. They answer: Are we creating sustainable value or extracting until exhaustion?

The wisdom: Any metric optimized in isolation will eventually destroy the system it measures.

A product could force discovery by constantly interrupting users with new content. Health metrics prevent this by measuring things like user satisfaction and ecosystem diversity.

The Deep Theory: Why Vanity Metrics Persist

Understanding why we cling to vanity metrics helps us let them go:

The Comfort of Activity Activity metrics almost always go up. More users means more clicks. More clicks means more pages. It's psychologically satisfying to watch numbers increase, even if they mean nothing.

The Simplicity of Counting It's easier to count actions than measure value. Page views are simple. "Did this user achieve their goal?" requires deep understanding of user psychology.

The Politics of Performance Vanity metrics make everyone look good. Your team is "driving engagement." Your manager shows hockey stick charts. The board sees growth. No one asks hard questions until users start leaving.

The Discovery Moment: Your Product's Core Value

Every product has its version of "discovering new music." This is the moment users realize why your product exists in their life.

The framework for finding yours:

Step 1: The Success Interview Don't ask users what features they like. Ask: "Tell me about a time our product made you feel successful." Listen for emotional words, not functional descriptions.

Step 2: The Switching Story Ask users who switched from competitors: "What made you choose us?" Ignore feature comparisons. Focus on what job they hired you to do that others couldn't.

Step 3: The Recommendation Reason Ask power users: "When you recommend us to friends, what story do you tell?" This reveals the core value in their words, not yours.

The pattern you're looking for: Users don't value your product. They value what your product helps them become.

A Note on Causation vs Correlation

It's important to note that much of what we know about metrics and retention comes from correlation studies. While companies like Spotify have found strong associations between discovery behaviors and retention, proving direct causation is complex. What matters for your product is testing these hypotheses and finding what correlations hold true for your specific users.

Your 14-Day Sprint

This playbook synthesizes best practices from product teams who've successfully moved beyond vanity metrics:

Days 1-3: Audit

  • List all current metrics Export your entire dashboard into a spreadsheet with columns for: metric name, current value, update frequency, owner (if any), and business justification. Most teams discover they're tracking 40-70 metrics but can only explain why 10-15 actually matter. Include everything from your analytics tools, weekly reports, OKRs, and that random spreadsheet someone updates manually.

  • Run zombie test For each metric, ask three questions: (1) If we optimized only for this metric, would customers succeed? (2) If this metric doubled, would our business be healthier? (3) Can this metric improve while user value decreases? Score each Yes=1, No=0. Any metric scoring 2 or 3 is likely a vanity metric. Note: Some activity metrics matter in context—the key is understanding which ones truly predict value.

  • Find your "discovery" equivalent Interview 10 power users and ask: "Tell me about the last time our product made you feel successful." Look for patterns. For a music service, it might be discovering new artists. For a productivity tool, it might be "finally getting my workflow automated." Write it as: "Users discover value when they [specific action] that leads to [specific outcome]."

Days 4-7: Design

  • Define 3 input metrics Choose metrics your team can influence this sprint through shipping code, running experiments, or changing designs. Each must be: (1) Changeable in <2 weeks, (2) Clearly owned by one person, (3) Leading indicator of your discovery moment. Example: If your discovery moment is "users collaborating on documents," your input metrics might be: share button prominence, time to first share, and successful permission grants. Write each as "We can increase [metric] from [current] to [target] by [specific action]."

  • Map to output metrics Draw the hypothesized causal chain from each input to business value. Look for correlations in historical data. For example: "Share button visibility (input) → Documents shared (behavior) → Multi-user documents (activation) → Team retention (output) → Revenue (outcome)." Note which links are proven vs hypothesized. Be honest about uncertainty.

  • Add health guardrails Define 2-3 metrics that prevent gaming your own system. These answer: Are we creating real value or just juicing numbers? Common categories: (1) Quality scores - is the core action meaningful? (2) Sustainability metrics - are users burning out? (3) Diversity indices - are we creating filter bubbles? Set yellow flags (investigate) and red flags (stop everything) for each.

Days 8-11: Implement

  • Set up dashboards Create three views: Daily (inputs only, for standups), Weekly (inputs → outputs connection), Monthly (full stack including health). Use your existing tools but enforce hierarchy - don't let output metrics creep into daily view. Each metric needs: current value, target, trend line, and owner's face/name. Make it impossible to hide. Pro tip: Screenshot your old dashboard before switching - you'll want to compare in 30 days.

  • Assign owners Each metric gets ONE person's name, not a team. This person presents the metric in reviews, explains changes, and proposes improvements. Give them authority to make changes without approval if it only affects their metric. Create a simple contract: "I own [metric]. I can change [list of things] without asking. I must escalate if [conditions]." Post this publicly. Ownership without authority is just responsibility for failure.

  • Create review rhythm Daily standup (5 min): Did any input metric move >10%? Why? Weekly review (30 min): Are inputs correlating with outputs as expected? What experiments to start/stop? Monthly deep dive (2 hours): Full correlation analysis, health check, and strategy adjustment. Book these as recurring meetings now with mandatory attendance from metric owners. Focus on learning, not judgment.

Days 12-14: Iterate

  • Run first review Use your weekly template even though it's only been 3-4 days. You're practicing the rhythm, not evaluating success. Expect to discover: missing data, unclear ownership, and metrics that seemed important but no one can explain. Document every "wait, why do we track this?" moment. Common realization: 50% of your metrics have no owner and no one noticed they haven't updated in months.

  • Adjust based on learnings You'll typically need to: (1) Simplify metric definitions that were too complex, (2) Reassign metrics when the theoretical owner says "I don't actually control this," (3) Kill 1-2 metrics that seemed important but have no clear path to value. Make changes immediately - don't wait for perfection. The goal is momentum, not a perfect system on day 14.

  • Document what works Write a one-page "How We Measure Success" document. Include: Our discovery moment definition, The 3-5-3 metrics we track (input-output-health), Who owns what, When we review them, and What we learned from killing vanity metrics. Share with your entire team and ask: "What's missing?" Their confusion points become your v2 improvements. This document becomes your onboarding bible for new team members.

The Philosophical Close

Here's why this matters beyond metrics:

Products that measure consumption create features. Products that measure discovery create value.

The first leads to feature factories pumping out updates no one requested. The second leads to products so essential that users panic when they're down for five minutes.

Spotify could have competed with Apple Music on features—better queue management, more playlist options, higher quality audio. Instead, they competed on helping users fall in love with new artists.

They measured what mattered. And built a $25 billion business on discovery, not consumption.

The framework is proven. The theory is sound. The playbook is in your hands.

The only question is: Will you have the courage to measure what matters?

Your move.

~ Warbler