PostHog Handbook Library / Engineering

812 words. Estimated reading time: 4 min.

Session replay architecture

Session recording architecture: ingestion → processing → serving

1. Capture (client-side)

PostHog-JS uses rrweb (record and replay the web) to:

Events include metadata: $window_id, $session_id, $snapshot_source (Web/Mobile), timestamps, distinct_id

2. Ingestion pipeline

Phase 1: Rust capture service (recordings mode)

rust/capture/src/router.rs:235 and rust/capture/src/v0_endpoint.rs:342

Kafka sink (rust/capture/src/sinks/kafka.rs):

Phase 2: Blob ingestion consumer (Node.js/TypeScript)

plugin-server/src/main/ingestion-queues/session-recording-v2/

SessionRecordingIngester consumes from Kafka and:

  1. Parses gzipped/JSON messages (kafka/message-parser.ts)
  2. Batches by session via SessionBatchRecorder
  3. Buffers events in memory per session using SnappySessionRecorder:
    public recordMessage(message: ParsedMessageData): number {
        if
   ...
   133| {
            this.endDateTime = message.eventsRange.end
        }

        for (const [windowId, events] of Object.entries(message.eventsByWindowId)) {
            for (const event of events) {
                const serializedLine = JSON.stringify([windowId, event]) + '\n'
                const chunk = Buffer.from(serializedLine)
                this.uncompressedChunks.push(chunk)

                const eventTimestamp = event.timestamp
                const shouldComputeMetadata = eventPassesMetadataSwitchoverTest(
                    eventTimestamp,
                    this.metadataSwitchoverDate
                )

                if (shouldComputeMetadata) {
                    // Store segmentation event for later use in active time calculation
                    this.segmentationEvents.push(toSegmentationEvent(event))

                    const eventUrl = hrefFrom(event)
                    if (eventUrl) {
                        this.addUrl(eventUrl)
                    }

                    if (isClick(event)) {
                        this.clickCount += 1
                    }

                    if (isKeypress(event)) {
                        this.keypressCount += 1
                    }

                    if (isMouseActivity(event)) {
                        this.mouseActivityCount += 1
                    }

                    this.eventCount++
                    this.size += chunk.length
                }

                rawBytesWritten += chunk.length
            }
        }

        this.messageCount += 1
        return rawBytesWritten
    }
  1. Flushes periodically (max 10 seconds buffer age or 100 MB buffer size)

Persistence (sessions/s3-session-batch-writer.ts):

Metadata written to ClickHouse via Kafka:

3. Storage schema

ClickHouse tables

session_replay_events (primary, v2):

session_id, team_id, distinct_id
min_first_timestamp, max_last_timestamp
block_first_timestamps[], block_last_timestamps[], block_urls[]
first_url, all_urls[]
click_count, keypress_count, mouse_activity_count, active_milliseconds
console_log_count, console_warn_count, console_error_count
size, message_count, event_count
snapshot_source, snapshot_library
retention_period_days

session_recording_events (legacy):

PostgreSQL

PostgreSQL writes happen when:

  1. User pins to playlist → Immediate write
  2. User requests persistence → Immediate write + background LTS copy task
  3. Auto-trigger on save → Background LTS copy task (via post_save signal)
  4. Periodic sweep → Finds recordings 24hrs-90days old without LTS path, queues background tasks

Note: Regular session recordings (not pinned/persisted) do NOT write to PostgreSQL - they only exist in ClickHouse session_replay_events table until explicitly pinned or persisted as LTS.

posthog_sessionrecording model:

S3 object storage

4. Playback/Retrieval

API Flow (posthog/session_recordings/session_recording_api.py)

GET /api/projects/:id/session_recordings/:session_id/:

  1. Loads metadata from ClickHouse session_replay_events or Postgres
  2. Returns: duration, start_time, person info, viewed status

GET /api/projects/:id/session_recordings/:session_id/snapshots: Two-phase fetch:

  1. Phase 1: Returns available sources: ["blob"] or ["blob", "realtime"]
  1. Phase 2: Client requests ?source=blob

Source resolution:

Query (queries/session_replay_events.py):

SessionReplayEvents().get_metadata()  # metadata
SessionReplayEvents().get_block_listing()  # S3 blob locations

Returns block listing:

[{
  "blob_key": "s3://bucket/path?range=bytes=0-1000",
  "first_timestamp": "...",
  "last_timestamp": "...",
  "first_url": "...",
  "size": 1000
}, ...]

Frontend playback

frontend/src/scenes/session-recordings/player/

  1. sessionRecordingPlayerLogic fetches snapshot sources (only blob_v2 now, except for hobby)
  2. For each snapshot source fetches snapshots
  3. Decompresses Snappy blocks
  4. Parses JSONL: [windowId, event] per line
  5. Feeds to rrweb-player for DOM reconstruction
  6. Renders in iframe with timeline controls

Metadata (playerMetaLogic.tsx):

Key optimizations

Data flow summary

Browser (rrweb)
  → POST /s/ with $snapshot events
  → Rust Capture validates & produces to Kafka
  → Node.js Blob Ingestion buffers & compresses
  → Writes to S3 (session blocks) + ClickHouse metadata (via Kafka)
  → Frontend fetches metadata from ClickHouse
  → Frontend fetches blocks from S3 via pre-signed URLs
  → rrweb-player reconstructs & renders

Canonical URL: https://posthog.com/handbook/engineering/session-replay/session-replay-architecture

GitHub source: contents/handbook/engineering/session-replay/session-replay-architecture.md

Content hash: 983cdbe5cc4c48e4