How It Works

How Fusionaly's SQLite-based ingestion pipeline handles hundreds of thousands of events per day

Fusionaly uses SQLite with deliberate buffering and backpressure. No heavyweight cluster. Every layer—browser SDK, HTTP ingress, WAL-backed writes, and async aggregation—embraces eventual consistency. Bursts slow down briefly instead of overwhelming the database. Dashboards can lag by a minute during spikes while raw events stay durable.

Key guarantees:

Multi-layer retries prevent dropped events (browser → server → database)
Ingestion writes straight to disk (no volatile buffers)
Background jobs convert raw events into hourly aggregates
Backpressure favors delays over failed writes

Retry Strategy

Every layer has retries. Events don’t get lost.

Layer	Retry Behavior
Browser SDK	3 retries with exponential backoff (1s, 2s, 4s). Caches up to 100 events in localStorage when offline.
HTTP Ingress	Returns `503 + Retry-After` when saturated. SDK respects this.
Database Writes	`PerformWrite` retries up to 10 times with exponential backoff on `SQLITE_BUSY`.
Background Jobs	Failed batches stay in queue. No row advances until successfully processed.

The browser SDK uses navigator.sendBeacon() for page unload events—these survive even if the user closes the tab.

Data Flow

Browser SDK (200ms batches, 100-event localStorage cache)
        |
        v
/x/api/v1/events (HTTP ingress + concurrency limiter)
        |
        v
ingested_events table (durable queue)
        |
        v  Background job (100-row batches)
events table (authoritative log)
        |
        v  Aggregate upserts
site_stats / page_stats / ref_stats / ... (hourly buckets)
        |
        v
Dashboards (eventually consistent)

This embraces SQLite’s single-writer model instead of fighting it. Buffers are explicit, backpressure is intentional, and aggregates can lag briefly without threatening correctness.

Storage Layer

SQLite with WAL mode, busy_timeout = 5000, and a connection pool (10 open / 5 idle). The PerformWrite helper retries transactions up to 10 times with exponential backoff.

Three queueing layers:

HTTP handler - Concurrency limiter returns 503 + Retry-After when saturated
Connection pool - Goroutines wait for a connection instead of hammering the DB
WAL queue - Serializes writes while allowing concurrent reads

Processing Pipeline

Browser SDK - Batches events (200ms intervals, ≤10 per batch), retries with backoff, caches up to 100 events in localStorage when offline
Ingestion API - Validates, normalizes URLs, filters excluded IPs, persists to ingested_events
Background processor - Runs every 60s, drains pending rows in 100-row batches into events
Aggregate upserts - INSERT ... ON CONFLICT into hourly tables (site_stats, page_stats, etc.)

No row is dropped. Every failure triggers a retry before advancing.

Hot vs Cold Data

Path	Tables	Consistency	Use case
Hot	`ingested_events`, `events`	Synchronous	Visitor Transparency, exact queries
Cold	`*_stats` tables	Eventually consistent	Dashboards, reports

Cold tables are hourly buckets—even multi-year queries read only ~24 rows per day per dimension.

Burst Handling

During a spike (1,000 events/second):

Browsers coalesce requests → ~100 HTTP req/s actually hit the server
Ingestion accepts everything into ingested_events, responds fast
Background processor drains queue in 100-row chunks, loops until empty
Aggregates lag briefly but remain ordered

Performance

Reads: Most dashboard queries < 10ms. Hourly aggregates keep queries cheap.

Writes: WAL mode + batch transactions (100 events/commit) + client/server retries. No RAM buffer needed.

-- Typical dashboard query (single-digit ms)
SELECT strftime('%Y-%m-%d', hour) AS day, SUM(visitors) AS visitors
FROM site_stats
WHERE website_id = ? AND hour BETWEEN datetime('now', '-30 days') AND datetime('now')
GROUP BY day;

Storage Estimates

~240 bytes per event (including indexes and aggregate overhead).

Traffic	Daily Events	Daily Growth	1-year	5-year
1K visits/day	~3,750	~0.7 MB	~0.3 GB	~2 GB
10K visits/day	~37,500	~7 MB	~2.6 GB	~19 GB
100K visits/day	~375,000	~72 MB	~26 GB	~190 GB

Storage grows linearly with time, not exponentially with traffic spikes.

On SQLite limits: SQLite theoretically supports 281 TB databases. You will never hit this. A site with 100K daily visits accumulates ~26 GB/year. At that rate, reaching SQLite’s limit would take about 10,000 years. Plan for disk space, not database limits.

Summary

SQLite works because: hot tables stay lean, dashboards query compact hourly aggregates, writes never block readers. A modest instance handles ~400 req/sec—enough for hundreds of thousands of events per day.

How It Works

Retry Strategy

Data Flow

Storage Layer

Processing Pipeline

Hot vs Cold Data

Burst Handling

Performance

Storage Estimates

Summary

Getting Started

Features

Administration

Technical

Integrations