Internal Architecture & Queueing Strategy

How Fusionaly's ingestion pipeline, SQLite stack, and aggregation layers work together to absorb bursts and stay scalable

Fusionaly doesn’t follow the traditional analytics playbook. Instead of provisioning a heavyweight cluster, the platform uses a minimalist SQLite-based datastore with deliberate buffering and backpressure. Every layer—browser SDK, HTTP ingress, WAL-backed writes, and asynchronous aggregation—leans into eventual consistency. Throughput is intentionally capped with server-side concurrency control and client-side retries, so bursts slow down briefly instead of overwhelming SQLite. The single-tenant focus keeps datasets small, trims architectural complexity, and makes short-lived lag acceptable compared to chasing multi-tenant scale. Dashboards stay fast, but they aren’t second-by-second real-time; during spikes, aggregates can lag by a minute or two while raw events stay durable.

This guide explains how that architecture works end to end and why it still handles hundreds of thousands of events per day on a single tenant installation.

Key guarantees:

Client-side retries and local caching prevent dropped events when browsers go offline.
The ingestion API writes straight to disk with WAL retries; no volatile buffers to lose data.
Background jobs convert hot data into hourly aggregates, so dashboards read compact tables.
Backpressure (busy timeouts, HTTP limiter) favors temporary delays over failed writes.

Fusionaly Storage Architecture

Fusionaly makes four promises about how your analytics data is handled:

Buffered writes with durable storage – every event hits disk immediately (no RAM-only queue) and the write path retries automatically.
Eventually consistent dashboards – aggregates update asynchronously, so heavy bursts may delay charts for a short period, but they always catch up.
Client-side buffering with bounded offline cache – the SDK batches events, retries with expo-backoff, stores up to 100 pending events in localStorage (oldest trimmed if offline too long), and drains them when the visitor comes back online.
Graceful burst handling – when processing lags, the system prefers small queues and delayed aggregates over dropped events, keeping your data stable.

Our storage layer uses SQLite under the hood, but you can think of it as a purpose-built analytics datastore:

Storage Safeguards

The datastore keeps writes smooth by combining WAL mode with busy_timeout = 5000, _txlock=immediate, and a fixed connection pool (10 open / 5 idle by default). On top of that, the PerformWrite helper retries each transaction up to 10 times with exponential backoff, so transient locks never bubble up to HTTP clients.

Built-in Queueing Layers

The storage layer sees well-behaved traffic because:

HTTP request -> GORM connection pool -> SQLite WAL queue
                     v                   v
            (10 pooled conns)    (busy_timeout + retries)

Layer 1 (HTTP handler): public ingestion endpoints run behind a Fiber-based concurrency limiter that returns 503 + Retry-After: 1 when the queue is full.
Layer 2 (Connection pool): goroutines wait for a pooled connection instead of hammering the database.
Layer 3 (WAL queue): the storage layer serializes writes internally while allowing unlimited concurrent readers. PerformWrite’s retries ensure the goroutine eventually commits once its turn arrives.

Event Processing & Queueing

Fusionaly’s ingestion path is intentionally layered so bursts never overwhelm SQLite:

Server Dataflow at a Glance

Here’s the bird’s-eye view of how a single-tenant Fusionaly instance moves data. Think of it as a pragmatic pipeline: hot writes stay durable, cold aggregates settle asynchronously, and every junction has built-in backpressure.

Browser SDK
  - 200 ms batches
  - Exponential retries
  - Local cache (100 events)
        ||
        \/
/x/api/v1/events (HTTP ingress)
  -> Concurrency limiter (returns 503 + Retry-After when saturated)
  -> Durable write (PerformWrite with WAL + busy_timeout + retries)
        ||
        \/
ingested_events (hot queue table)
        ||
        \/  Jobs.ProcessEvents (100-row batches, WAL transactions)
events table (authoritative log) --------> Admin APIs needing exact data
        ||
        \/  UpdateAllAggregatesBatch
site_stats / page_stats / device_stats / ... (hourly aggregates)
        ||
        \/  Dashboards & Reports (eventually consistent)

This architectural choice embraces SQLite’s single-writer model instead of fighting it. Buffers are explicit (browser-side batching + localStorage cache + ingested_events table), backpressure is intentional (client retries, busy timeouts, server retries, HTTP limiter), and aggregates are treated as cold data that can lag briefly without threatening correctness. Few analytics stacks run on SQLite, but constraining the problem to a single tenant mitigates the usual pain points: WAL ensures durability, the hot tables remain lightweight, and hourly aggregates provide compact read models for dashboards.

End-to-End Pipeline

Browser SDK buffer - events are queued locally (200 ms send interval, batch size <= 10) and fall back to a 100-event localStorage cache when offline.
Public ingestion API - /x/api/v1/events validates payloads, normalizes URLs, filters excluded IPs, and persists the raw record into the ingested_events table.
Background processor - jobs.ProcessEvents wakes up every 60 s (configurable) and drains pending rows in batches of 100, converting them into canonical events.
Aggregate upserts - the same batch data feeds UpdateAllAggregatesBatch, which issues INSERT ... ON CONFLICT statements into hourly tables such as site_stats, page_stats, ref_stats, device/browser/OS stats, UTM stats, and custom event_stats.

Browser SDK -> HTTPS ingestion -> ingested_events queue -> batch normalization -> aggregate upserts

No row is dropped during the transition between stages; every failure triggers a retry before the job advances to the next batch.

The only buffers in this flow are the browser’s batching layer (<= 10 events every 200 ms, plus a 100-event localStorage fallback) and the durable ingested_events table. Because both layers retry with exponential backoff, outright failures are extremely rare; the trade-off is short-term eventual consistency, where aggregates can lag by a few seconds or minutes during massive bursts but always catch up once the backlog drains.

Client-Side Buffering & Retry Logic

Event batching: events are sent in batches every 200 ms (up to 10 events per batch).
Retry with backoff: failed sends are retried automatically with increasing delays.
Offline cache: persistent localStorage cache (capped at 100 events) prevents data loss when offline.
Unload handling: navigator.sendBeacon ensures events are sent before tab closes.

Server-Side Durability

Ingest-first strategy: incoming events land in ingested_events via PerformWrite, then a background job migrates them to events and aggregates, so HTTP responses stay fast.
SQLite-native retries: busy_timeout = 5000 plus up to 10 Go-level retries prevent database is locked leaks.

Aggregation Queue

Batch-oriented: The background job reads processed = 0 ORDER BY created_at, so the oldest events leave the queue first and related batches stay together.
Hourly buckets: every aggregate table (site/page/referrer/device/browser/OS/UTM/custom event) stores one row per (website_id, hour, dimension) pair. Queries spanning years read at most 24 rows per day per dimension, so multi-year charts stay fast even as raw events grow.
Idempotent upserts: hourly rows use INSERT ... ON CONFLICT increments, so replaying a batch only bumps counters.
Session-aware enrichment: derived flags (new visitor, session boundary, bounce, entrances, exits, UTM) are computed once during aggregation, keeping dashboards off the hot events table.

Hot vs. Cold Data

Hot path: ingested_events and events tables hold the authoritative raw data and are updated synchronously with each HTTP request. Anything that needs point-in-time accuracy (e.g., Visitor Transparency) reads from here.
Cold path: hourly aggregate tables (site_stats, page_stats, ref_stats, device/browser/OS, UTM, custom event_stats) are filled asynchronously. They are eventually consistent—during a burst they may lag behind the hot tables—but once the queue drains, they reflect the full history.
Query cost: because cold tables are hourly buckets, even a multi-year range is only ~24 rows per day per dimension, so reporting and dashboards stay cheap regardless of total event count.

Worst-Case Burst Handling

During a spike (for example, 1,000 events/second for 30 seconds):

Browsers coalesce requests and retry locally, so only ~100 HTTP requests per second hit the server despite higher user activity.
Ingestion API accepts every payload into ingested_events and keeps the HTTP handlers fast: responses go out once the write queue acknowledges the row, not when aggregates finish.
Background processor drains the queue in 100-row chunks. If processing takes longer than the default 60-second cadence, the job simply keeps looping until processed=0 disappears; no human intervention is required.
Aggregates briefly lag but remain ordered. Dashboards querying the hourly tables pick up the new data as soon as each batch commits, and because the queue is FIFO, you never see recent hours without their earlier events.

Performance Characteristics

Fast Reads

SQLite excels at read operations, which is perfect for analytics dashboards:

Query Performance: Most dashboard queries execute in < 10ms
Indexes: Optimized indexes on timestamp, page_url, and visitor_id fields
Read-ahead Caching: Frequently accessed data is kept in memory
Connection Pooling: Multiple read connections can operate simultaneously

Example dashboard query (runs against site_stats and finishes in single-digit milliseconds on a modest VPS):

SELECT strftime('%Y-%m-%d', hour) AS day,
       COALESCE(SUM(visitors), 0) AS visitors
FROM site_stats
WHERE website_id = :website_id
  AND hour BETWEEN datetime('now', '-30 days') AND datetime('now')
GROUP BY day
ORDER BY day ASC;

Behavior Under Intense Writes

SQLite handles high write loads gracefully with proper configuration:

Write Performance Optimizations:

WAL Mode: Write-Ahead Logging allows concurrent reads during writes
Batch Transactions: Background processors commit ~100 normalized events per transaction, so each WAL sync amortizes the cost of multiple writes.
Asynchronous Jobs: Non-critical aggregations run in background jobs so ingestion stays lean
Client + Server Retries: The browser SDK retries failed posts, and PerformWrite retries SQLite transactions up to 10 times, so there’s no need for an in-memory write buffer

Concurrent Write Handling:

Peak Traffic Scenario (1000 events/second):
|-- Browser SDK (client batching + retry)
|-- Ingested Events table (durable queue)
|-- Batch processor (100 events/transaction)
|-- WAL Journal (enables concurrent reads)
\-- Database File (final persistent storage)

Under extreme load (> 2000 events/second), Fusionaly:

Keeps the batch processor running fixed 100-row commits so transactions stay short and predictable
Lets the browser SDK + PerformWrite retry loops absorb transient contention rather than buffering writes in RAM

Behavior Under Intense Reads

Analytics dashboards often generate many simultaneous read queries:

Read Optimization Strategies:

Connection Pooling: Up to 20 concurrent read connections
Query Result Caching: Frequent queries are cached for 30-60 seconds
Read Replicas: Heavy read workloads can use read-only database copies
Prepared Statements: Common queries are pre-compiled for faster execution

Real-world Read Performance:

Dashboard loading: 50-100 queries in < 200ms total
Real-time updates: Background queries every 10 seconds
Report generation: Complex queries complete in < 2 seconds

Database Size & Storage Requirements

Maximum Database Size

SQLite theoretical limits are enormous:

Maximum database size: 281 TB (281,474,976,710,656 bytes)
Maximum row count: 2^64 rows
Maximum file size: Limited by filesystem (typically 8 TB+ on modern systems)

Realistic Size Expectations

To gauge how storage scales with traffic, assume each visit yields ~3.75 events (2.5 page views plus supporting clicks/custom events) and that aggregates add ~12% on top of the raw events file.

Per-Event Storage Estimate

Looking at the events schema (website ID, visitor signature, hostname, pathname, referrer, event type, metadata JSON, timestamps) and the indexes maintained (website_id + timestamp, pathname, user_signature), a single event typically consumes:

Component	Approx. bytes	Notes
Row payload	~150 B	Strings are stored as UTF-8; 64-byte signatures and 60-byte paths dominate
Index entries	~30 B	Timestamp + website composite index plus pathname index
Aggregate impact	~20 B	Hourly `site_stats`, `page_stats`, `ref_stats`, etc. grow slowly because they upsert per bucket, but budgeting ~20 B/event keeps headroom
Subtotal	~200 B	0.00019 MB/event
Safety margin (+20%)	+40 B	Covers WAL spillover, VACUUM windows, filesystem metadata
Total (plan for)	~240 B/event (~0.00024 MB)	Multiply by your daily events to predict growth

If you emit lots of custom events (e.g., scroll-depth, section-view, or button instrumentation that fires multiple times per session), plug the higher event count into this per-event estimate. Scroll-based automations can easily double or triple the daily event volume depending on page length.

Storage Comparison by Traffic Tier

Traffic Tier	Daily Visits	Events/day	Persistent Growth/day	1-year Storage	5-year Storage (20% YoY)
Small / Personal	1,000	~3,750	~0.72 MB	~0.26 GB	~1.9 GB
Medium / Business	10,000	~37,500	~7.2 MB	~2.6 GB	~19 GB
Large / High-Traffic	100,000	~375,000	~72 MB	~25.7 GB	~191 GB

Add ~15% headroom on top of the yearly/long-term numbers to account for WAL files, VACUUM windows, and OS-level free space recommendations.

Because aggregates are hourly and deduplicated, storage grows linearly with time rather than exponentially with traffic spikes. If needed, archival policies can move data older than two years into separate SQLite files without touching live dashboards.

Conclusion

Fusionaly uses SQLite wisely: the hot events table stays lean, dashboards query compact hourly aggregates, and most queries target recent time ranges. Even a modest instance can sustain ~400 req/sec during load tests—enough to ingest hundreds of thousands of events per day while keeping charts responsive. Because aggregates compress historical data and the write path never blocks readers, this architecture comfortably handles millions of daily events for most single-tenant deployments.