PostHog Handbook Library / Company

1,224 words. Estimated reading time: 6 min.

Surveys SDK bug

Auto TL;DR

At a Glance

This long page covers these main areas. The list is generated from the article headings, so it updates with every handbook rebuild.

  1. Summary
  2. Timeline
  3. Root cause analysis
  4. The technical problem
  5. Why it failed
  6. Why it wasn't caught
  7. Impact
  8. Remediation

On October 3, 2025, a backwards compatibility issue in the PostHog Surveys SDK (version 1.270.0) caused widespread JavaScript exceptions for customers using SDK versions older than 1.257.1. The issue lasted 5 hours and 26 minutes, affecting 305 teams and disrupting both survey functionality and error tracking metrics.

Summary

A backwards compatibility break in SDK version 1.270.0 introduced a dependency on the isDisabled function from the PostHogPersistence class, which was only added in version 1.257.1 (July 2025). The issue manifested when the asynchronously-loaded survey extension attempted to call this function on older SDK versions where it didn't exist, causing JavaScript exceptions in customer applications. The incident was initially detected through customer support tickets rather than automated monitoring, leading to a 4+ hour detection delay and extended customer impact.

Timeline

All times in UTC.

Total impact duration: 5 hours 26 minutes (10:45 – 16:11 UTC) Detection delay: 4 hours 14 minutes

Root cause analysis

The culprit PR introduced the backwards compatibility issue.

The technical problem

The PR modified the surveys SDK to use posthog.persistence instead of accessing localStorage directly – a reasonable architectural improvement. To ensure backwards compatibility, the code needed to check whether posthog.persistence was available before attempting to use it.

The implementation used the isDisabled function from the PostHogPersistence class, adding a utility in survey-utils.ts to verify persistence availability. However, this function was only introduced in a PR merged on July 11 and first made available in SDK version 1.257.1.

Why it failed

When PR #2355 was merged, both the main SDK code (posthog-surveys.ts) and the extension code (extensions/surveys.tsx) relied on the isDisabled function.

For the main SDK bundle, this worked correctly – customers on older versions never loaded the new code containing the reference to isDisabled.

However, the survey extension creates an asymmetric loading scenario:

  1. The customer's application loads the SDK at whatever version they have installed (potentially months or years old)
  2. The survey extension is loaded asynchronously from our CDN and always downloads the latest version

This created a version mismatch where:

Why it wasn't caught

  1. No version compatibility testing: We lack automated tests that verify new extension code works with older SDK versions
  2. Code review gaps: We don't have a process to flag when new APIs are added to main SDK files that might be called by extensions
  3. No static analysis: No linter rules prevent extensions from calling functions that may not exist in older SDK versions
  4. Detection gaps: No monitoring alerted us to the spike in customer-side exceptions – we learned about it from support tickets

Impact

Severity: Major (High Impact, Service Degradation)

Affected customers: 305 teams running SDK versions < 1.257.1

User-facing impact:

Duration: 5 hours 26 minutes of active impact

Error tracking billing impact:

Remediation

We reverted the problematic changes and released SDK version 1.270.1, which restored compatibility with all SDK versions.

Immediate actions

Action item: Start incidents earlier. We should declare incidents as soon as we confirm an issue (around 14:59), not almost two hours after mitigation. This enables proper coordination and customer communication. Owner: @lucasheriques

Short-term improvements

Long-term improvements (Target: Q4 2025 – Q1 2026)

Lessons learned

What went well

What went poorly

Key takeaways

This incident revealed a critical architectural weakness in how our asynchronously-loaded extensions interact with versioned SDK code. The assumption that extensions can safely call any SDK function breaks down when we have customers on old SDK versions but always serve them the latest extension code.

We also had this similar issue in another incident here.

The 4+ hour detection delay highlights gaps in our observability for client-side errors. We lack visibility into exceptions occurring in customer applications using our SDK.

The improvements outlined above will address both the immediate technical issue and the systemic gaps in testing, monitoring, and deployment practices that allowed this to reach production and persist for over 5 hours.

Canonical URL: https://posthog.com/handbook/company/post-mortems/2025-10-03-surveys-sdk-bug

GitHub source: contents/handbook/company/post-mortems/2025-10-03-surveys-sdk-bug.md

Content hash: af3e89e79f8484de