PostHog Handbook Library / Growth

3,413 words. Estimated reading time: 16 min.

AI/LLM Observability

Auto TL;DR

At a Glance

This long page covers these main areas. The list is generated from the article headings, so it updates with every handbook rebuild.

  1. What is the job to be done?
  2. What PostHog products are relevant?
  3. Adoption path and expansion path
  4. Entry point
  5. Primary expansion path
  6. Alternate expansion paths
  7. Business impact of solving the problem
  8. Personas to target

What is the job to be done?

"Help me understand how my AI features perform, what they cost, and how users interact with them."

This is the fastest growing segment of our customer base. AI-native companies are adopting PostHog at a high rate, but often only for LLM Observability or only for Product Analytics. The cross-sell opportunity is significant because AI products have unique observability needs that span multiple PostHog products.

The buyer persona is distinct: AI engineers care about model-level metrics (latency, cost, token usage, accuracy) first, user-level analytics second. Leading with the AI story opens the door to everything else.

What PostHog products are relevant?

Adoption path and expansion path

Entry point

Usually LLM Observability or Product Analytics. Two common patterns:

  1. Model-first: AI engineer wants to understand model performance: latency, cost, token usage. They start with LLM Observability for tracing and cost attribution, then realize they need to understand how users interact with the output (Product Analytics), whether the output is actually good (AI Evals), and how to test improvements (Experiments).
  2. Product-first: AI product team is building a product with AI features and starts with Product Analytics to track user behavior. They realize they need model-level metrics alongside user metrics, which pulls in LLM Observability. From there, they want to evaluate quality (AI Evals) and test prompt/model changes (Experiments).

Primary expansion path

LLM Observability → + AI Evals → + Product Analytics (user behavior) → + Experiments (prompt/model testing) → + Error Tracking → + Session Replay

The logic of each step:

Alternate expansion paths

Starting from Product Analytics: An AI product team already using PostHog for product analytics. They add LLM Observability to get model-level metrics alongside their user behavior data. From there, AI Evals and Experiments are natural adds.

Starting from Error Tracking: Team catching model failures with Error Tracking. They realize traditional error tracking misses quality regressions (model responds but with worse output). AI Evals fills this gap, pulling in LLM Observability for the full model-level context.

Business impact of solving the problem

AI-native companies are the fastest growing customer segment. Getting in early with LLM Observability means PostHog becomes the default platform as these companies scale. AI-native startups that adopt PostHog at seed stage often grow into significant accounts.

The cross-sell opportunity is uniquely strong. AI products sit at the intersection of multiple PostHog use cases: model observability (AI/LLM Obs), user behavior analytics (Product Intelligence), release management for prompt/model changes (Release Engineering), and error tracking for model failures (Observability). One AI customer can reasonably adopt products from 4+ use cases.

No one else has this combination. Langfuse and Helicone do LLM tracing. Amplitude does product analytics. Sentry does error tracking. No one connects model performance → output quality → user behavior → business outcomes in one platform. That's PostHog's pitch.

AI Evals is the bridge product. For any account building AI features, AI Evals connects AI/LLM Observability to Product Intelligence (are users struggling based on output quality?) and Release Engineering (did a prompt change cause a quality regression?). It's a natural entry point into multiple use cases from a single product.

Personas to target

| Persona | Role Examples | What They Care About | How They Evaluate | |---|---|---|---| | AI Engineer | ML Engineer, AI Engineer, Applied AI | Model performance, cost optimization, latency, quality | "Can I see cost per query by model, trace individual calls, and detect quality regressions?" | | AI Product Manager | AI PM, Product Lead (AI features) | User experience of AI features, adoption rates, business impact | "Can I see how users interact with our AI features and whether they drive retention?" | | AI Founder | Founder, CTO at AI-native startup | All of the above. Cost control. Speed. Not paying for 5 tools. | "How fast can I set this up and how much does it replace?" | | AI Product Engineer | Full-stack engineer building AI features | Instrumentation, debugging, prompt iteration cycle time | "How easy is it to instrument? Can I see trace-level detail for debugging?" |

Signals in Vitally & PostHog

Vitally indicators this use case is relevant

| Signal | Where to Find It | What It Means | |---|---|---| | LLM Observability is active | Product usage data | AI/LLM Obs use case is live. Full expansion path available. | | Company tags include "AI" or "LLM" or "ML" | Company info / tags | AI-native or AI-building company. This use case is likely relevant even if they haven't adopted LLM Observability yet. | | High Product Analytics usage + AI company | Product usage + company type | They're using analytics but haven't connected model-level metrics. LLM Observability is the add. | | Customer mentions Langfuse, Helicone, or "LLM costs" in notes | Vitally notes / conversations | Direct signal. They're thinking about AI observability and may be using a competitor or building it in-house. |

PostHog usage signals

| Signal | How to Check | What It Means | |---|---|---| | LLM-related custom events (e.g., llm_generation, ai_response) | Event property explorer | They're tracking AI events in Product Analytics. LLM Observability would give them model-level detail. | | High LLM Observability trace volume | Product usage metrics | Active AI instrumentation. Ripe for AI Evals and Experiments. | | Experiments on AI-related features | Experiments list | They're already A/B testing AI features. Validate they're using LLM Obs for model-level measurement. | | Error Tracking exceptions from AI/model code | Error tracking events | Model failures are happening. LLM Observability gives context beyond the stack trace. |

Command of the Message

Discovery questions

Negative consequences (of not solving this)

Desired state

Positive outcomes

Success metrics

Customer-facing:

TAM-facing:

Competitive positioning

Our positioning

Competitor quick reference

| Competitor | What They Do | Our Advantage | Their Advantage | |---|---|---|---| | Langfuse | Open-source LLM tracing, prompt management, evals | Broader platform (product analytics, experiments, replay, error tracking); user behavior metrics; not just model metrics | More mature LLM-specific features; open-source community; purpose-built prompt management | | Helicone | LLM request logging, cost tracking, caching | Broader platform; user behavior connection; experiments; not a single-purpose tool | Simpler to set up for basic LLM logging; built-in caching/rate limiting features | | Braintrust | LLM evals, logging, prompt playground | Broader platform; user behavior metrics; production monitoring not just offline evals | More mature eval framework; better prompt playground and iteration workflow | | Datadog LLM Monitoring | LLM tracing as part of broader APM | Product analytics integration; user behavior; better pricing for AI-native startups | Full APM stack; enterprise-grade; part of existing Datadog deployment for bigger companies |

Honest assessment: Our strongest position is with AI-native startups and teams building AI features inside existing products. The pitch is "one platform for everything" instead of Langfuse + Amplitude + Sentry + a flag tool. We're weaker against teams that want the deepest possible LLM-specific tooling (Langfuse's prompt management and eval framework are more mature). We're also weaker against enterprise teams already embedded in Datadog. Our sweet spot is AI teams that want model performance connected to user outcomes in one place, without managing 4 vendors.

Pain points & known limitations

| Pain Point | Impact | Workaround / Solution | |---|---|---| | LLM Observability feature set is newer than Langfuse | Teams expecting Langfuse-level prompt management and eval detail may find gaps | Be honest about maturity. Position the breadth of the platform (analytics, experiments, replay) as the differentiator. Langfuse is great for pure LLM tracing; PostHog is better when you also need to understand user behavior and business impact. | | AI Evals may not support all evaluation frameworks | Teams with custom eval pipelines may want more flexibility | Check current eval capabilities. For custom frameworks, PostHog's API and data warehouse can integrate with existing eval pipelines. | | Session Replay for AI chat interfaces can be noisy | Chat-based AI products generate a lot of replay data per session | Configure sampling rules. Focus replay viewing on sessions with error events or low AI quality scores. |

Getting a customer started

What does an evaluation look like?

Onboarding checklist

Cross-sell pathways from this use case

| If Using... | They Might Need... | Why | Conversation Starter | |---|---|---|---| | LLM Observability only | AI Evals | They can see model metrics but don't know if the output is actually good | "You can see your model's latency and cost. But do you know if the quality held up after your last prompt change?" | | LLM Obs + AI Evals | Product Analytics | They know model performance and quality. They don't know how users interact with the output. | "Your model is fast and the quality is high. But are users actually accepting the suggestions and converting?" | | LLM Obs + Product Analytics | Experiments | They see model metrics and user behavior. They want to improve. | "You can see GPT-4o costs more but users seem to prefer it. Want to run a proper A/B test to quantify the difference?" | | AI feature releasing changes | Release Engineering (Feature Flags) | They're changing prompts/models and want controlled rollout | "When you change your prompt, do you ship to everyone at once? Feature flags let you roll out to 5% first and measure before going wide." | | AI features in PostHog | Product Intelligence (for the product team) | AI team is in PostHog. The broader product team should be too. | "Your AI team uses PostHog for model metrics. Has the product team seen what they can do with funnels and retention for non-AI features?" | | Error Tracking for AI errors | Observability (full stack) | They're catching AI errors but not traditional application errors | "You're tracking model failures. Are you also catching the non-AI exceptions? Error Tracking works for your entire stack." |

Internal resources

Appendix: Company archetype considerations

| Archetype + Stage | Framing | Key Products | Buyer | |---|---|---|---| | AI Native — Early | "You need to understand your model costs, catch quality regressions, and see how users interact with your AI features, all without hiring a data team or buying 4 tools." Speed and simplicity. One platform. | LLM Observability, AI Evals, Product Analytics, PostHog AI | Founder, AI engineer, founding PM | | AI Native — Scaled | "You're scaling AI features across your product. You need cost attribution by team/feature, automated quality evaluation, prompt/model experimentation, and the ability to connect model performance to business outcomes." | LLM Observability, AI Evals, Product Analytics, Experiments, Error Tracking, Session Replay | Head of AI/ML, AI PM, VP Eng | | Cloud Native — Any (building AI features) | "You're adding AI features to an existing product. PostHog already tracks your users. Now connect model performance to user behavior so you can optimize the AI experience alongside everything else." The pitch here is extending their existing PostHog usage, not adopting a new tool. | LLM Observability, AI Evals (added to existing PostHog stack) | Engineering team building the AI feature, PM who owns the AI feature |

Canonical URL: https://posthog.com/handbook/growth/use-case-selling/ai-llm-observability

GitHub source: contents/handbook/growth/use-case-selling/ai-llm-observability.md

Content hash: 578898f3c5c194d6