Skip to content

Internal Architecture & Queueing Strategy

How Fusionaly's ingestion pipeline, SQLite stack, and aggregation layers work together to absorb bursts and stay scalable

Fusionaly doesn’t follow the traditional analytics playbook. Instead of provisioning a heavyweight cluster, the platform uses a minimalist SQLite-based datastore with deliberate buffering and backpressure. Every layer—browser SDK, HTTP ingress, WAL-backed writes, and asynchronous aggregation—leans into eventual consistency. Throughput is intentionally capped with server-side concurrency control and client-side retries, so bursts slow down briefly instead of overwhelming SQLite. The single-tenant focus keeps datasets small, trims architectural complexity, and makes short-lived lag acceptable compared to chasing multi-tenant scale. Dashboards stay fast, but they aren’t second-by-second real-time; during spikes, aggregates can lag by a minute or two while raw events stay durable.

This guide explains how that architecture works end to end and why it still handles hundreds of thousands of events per day on a single tenant installation.

Key guarantees:

  • Client-side retries and local caching prevent dropped events when browsers go offline.
  • The ingestion API writes straight to disk with WAL retries; no volatile buffers to lose data.
  • Background jobs convert hot data into hourly aggregates, so dashboards read compact tables.
  • Backpressure (busy timeouts, HTTP limiter) favors temporary delays over failed writes.

Fusionaly makes four promises about how your analytics data is handled:

  1. Buffered writes with durable storage – every event hits disk immediately (no RAM-only queue) and the write path retries automatically.
  2. Eventually consistent dashboards – aggregates update asynchronously, so heavy bursts may delay charts for a short period, but they always catch up.
  3. Client-side buffering with bounded offline cache – the SDK batches events, retries with expo-backoff, stores up to 100 pending events in localStorage (oldest trimmed if offline too long), and drains them when the visitor comes back online.
  4. Graceful burst handling – when processing lags, the system prefers small queues and delayed aggregates over dropped events, keeping your data stable.

Our storage layer uses SQLite under the hood, but you can think of it as a purpose-built analytics datastore:

The datastore keeps writes smooth by combining WAL mode with busy_timeout = 5000, _txlock=immediate, and a fixed connection pool (10 open / 5 idle by default). On top of that, the PerformWrite helper retries each transaction up to 10 times with exponential backoff, so transient locks never bubble up to HTTP clients.

The storage layer sees well-behaved traffic because:

HTTP request -> GORM connection pool -> SQLite WAL queue
v v
(10 pooled conns) (busy_timeout + retries)
  • Layer 1 (HTTP handler): public ingestion endpoints run behind a Fiber-based concurrency limiter that returns 503 + Retry-After: 1 when the queue is full.
  • Layer 2 (Connection pool): goroutines wait for a pooled connection instead of hammering the database.
  • Layer 3 (WAL queue): the storage layer serializes writes internally while allowing unlimited concurrent readers. PerformWrite’s retries ensure the goroutine eventually commits once its turn arrives.

Fusionaly’s ingestion path is intentionally layered so bursts never overwhelm SQLite:

Here’s the bird’s-eye view of how a single-tenant Fusionaly instance moves data. Think of it as a pragmatic pipeline: hot writes stay durable, cold aggregates settle asynchronously, and every junction has built-in backpressure.

Browser SDK
- 200 ms batches
- Exponential retries
- Local cache (100 events)
||
\/
/x/api/v1/events (HTTP ingress)
-> Concurrency limiter (returns 503 + Retry-After when saturated)
-> Durable write (PerformWrite with WAL + busy_timeout + retries)
||
\/
ingested_events (hot queue table)
||
\/ Jobs.ProcessEvents (100-row batches, WAL transactions)
events table (authoritative log) --------> Admin APIs needing exact data
||
\/ UpdateAllAggregatesBatch
site_stats / page_stats / device_stats / ... (hourly aggregates)
||
\/ Dashboards & Reports (eventually consistent)

This architectural choice embraces SQLite’s single-writer model instead of fighting it. Buffers are explicit (browser-side batching + localStorage cache + ingested_events table), backpressure is intentional (client retries, busy timeouts, server retries, HTTP limiter), and aggregates are treated as cold data that can lag briefly without threatening correctness. Few analytics stacks run on SQLite, but constraining the problem to a single tenant mitigates the usual pain points: WAL ensures durability, the hot tables remain lightweight, and hourly aggregates provide compact read models for dashboards.

  1. Browser SDK buffer - events are queued locally (200 ms send interval, batch size <= 10) and fall back to a 100-event localStorage cache when offline.
  2. Public ingestion API - /x/api/v1/events validates payloads, normalizes URLs, filters excluded IPs, and persists the raw record into the ingested_events table.
  3. Background processor - jobs.ProcessEvents wakes up every 60 s (configurable) and drains pending rows in batches of 100, converting them into canonical events.
  4. Aggregate upserts - the same batch data feeds UpdateAllAggregatesBatch, which issues INSERT ... ON CONFLICT statements into hourly tables such as site_stats, page_stats, ref_stats, device/browser/OS stats, UTM stats, and custom event_stats.
Browser SDK -> HTTPS ingestion -> ingested_events queue -> batch normalization -> aggregate upserts

No row is dropped during the transition between stages; every failure triggers a retry before the job advances to the next batch.

The only buffers in this flow are the browser’s batching layer (<= 10 events every 200 ms, plus a 100-event localStorage fallback) and the durable ingested_events table. Because both layers retry with exponential backoff, outright failures are extremely rare; the trade-off is short-term eventual consistency, where aggregates can lag by a few seconds or minutes during massive bursts but always catch up once the backlog drains.

  • Event batching: events are sent in batches every 200 ms (up to 10 events per batch).
  • Retry with backoff: failed sends are retried automatically with increasing delays.
  • Offline cache: persistent localStorage cache (capped at 100 events) prevents data loss when offline.
  • Unload handling: navigator.sendBeacon ensures events are sent before tab closes.
  • Ingest-first strategy: incoming events land in ingested_events via PerformWrite, then a background job migrates them to events and aggregates, so HTTP responses stay fast.
  • SQLite-native retries: busy_timeout = 5000 plus up to 10 Go-level retries prevent database is locked leaks.
  • Batch-oriented: The background job reads processed = 0 ORDER BY created_at, so the oldest events leave the queue first and related batches stay together.
  • Hourly buckets: every aggregate table (site/page/referrer/device/browser/OS/UTM/custom event) stores one row per (website_id, hour, dimension) pair. Queries spanning years read at most 24 rows per day per dimension, so multi-year charts stay fast even as raw events grow.
  • Idempotent upserts: hourly rows use INSERT ... ON CONFLICT increments, so replaying a batch only bumps counters.
  • Session-aware enrichment: derived flags (new visitor, session boundary, bounce, entrances, exits, UTM) are computed once during aggregation, keeping dashboards off the hot events table.
  • Hot path: ingested_events and events tables hold the authoritative raw data and are updated synchronously with each HTTP request. Anything that needs point-in-time accuracy (e.g., Visitor Transparency) reads from here.
  • Cold path: hourly aggregate tables (site_stats, page_stats, ref_stats, device/browser/OS, UTM, custom event_stats) are filled asynchronously. They are eventually consistent—during a burst they may lag behind the hot tables—but once the queue drains, they reflect the full history.
  • Query cost: because cold tables are hourly buckets, even a multi-year range is only ~24 rows per day per dimension, so reporting and dashboards stay cheap regardless of total event count.

During a spike (for example, 1,000 events/second for 30 seconds):

  1. Browsers coalesce requests and retry locally, so only ~100 HTTP requests per second hit the server despite higher user activity.
  2. Ingestion API accepts every payload into ingested_events and keeps the HTTP handlers fast: responses go out once the write queue acknowledges the row, not when aggregates finish.
  3. Background processor drains the queue in 100-row chunks. If processing takes longer than the default 60-second cadence, the job simply keeps looping until processed=0 disappears; no human intervention is required.
  4. Aggregates briefly lag but remain ordered. Dashboards querying the hourly tables pick up the new data as soon as each batch commits, and because the queue is FIFO, you never see recent hours without their earlier events.

SQLite excels at read operations, which is perfect for analytics dashboards:

  • Query Performance: Most dashboard queries execute in < 10ms
  • Indexes: Optimized indexes on timestamp, page_url, and visitor_id fields
  • Read-ahead Caching: Frequently accessed data is kept in memory
  • Connection Pooling: Multiple read connections can operate simultaneously

Example dashboard query (runs against site_stats and finishes in single-digit milliseconds on a modest VPS):

SELECT strftime('%Y-%m-%d', hour) AS day,
COALESCE(SUM(visitors), 0) AS visitors
FROM site_stats
WHERE website_id = :website_id
AND hour BETWEEN datetime('now', '-30 days') AND datetime('now')
GROUP BY day
ORDER BY day ASC;

SQLite handles high write loads gracefully with proper configuration:

Write Performance Optimizations:

  • WAL Mode: Write-Ahead Logging allows concurrent reads during writes
  • Batch Transactions: Background processors commit ~100 normalized events per transaction, so each WAL sync amortizes the cost of multiple writes.
  • Asynchronous Jobs: Non-critical aggregations run in background jobs so ingestion stays lean
  • Client + Server Retries: The browser SDK retries failed posts, and PerformWrite retries SQLite transactions up to 10 times, so there’s no need for an in-memory write buffer

Concurrent Write Handling:

Peak Traffic Scenario (1000 events/second):
|-- Browser SDK (client batching + retry)
|-- Ingested Events table (durable queue)
|-- Batch processor (100 events/transaction)
|-- WAL Journal (enables concurrent reads)
\-- Database File (final persistent storage)

Under extreme load (> 2000 events/second), Fusionaly:

  • Keeps the batch processor running fixed 100-row commits so transactions stay short and predictable
  • Lets the browser SDK + PerformWrite retry loops absorb transient contention rather than buffering writes in RAM

Analytics dashboards often generate many simultaneous read queries:

Read Optimization Strategies:

  • Connection Pooling: Up to 20 concurrent read connections
  • Query Result Caching: Frequent queries are cached for 30-60 seconds
  • Read Replicas: Heavy read workloads can use read-only database copies
  • Prepared Statements: Common queries are pre-compiled for faster execution

Real-world Read Performance:

  • Dashboard loading: 50-100 queries in < 200ms total
  • Real-time updates: Background queries every 10 seconds
  • Report generation: Complex queries complete in < 2 seconds

SQLite theoretical limits are enormous:

  • Maximum database size: 281 TB (281,474,976,710,656 bytes)
  • Maximum row count: 2^64 rows
  • Maximum file size: Limited by filesystem (typically 8 TB+ on modern systems)

To gauge how storage scales with traffic, assume each visit yields ~3.75 events (2.5 page views plus supporting clicks/custom events) and that aggregates add ~12% on top of the raw events file.

Looking at the events schema (website ID, visitor signature, hostname, pathname, referrer, event type, metadata JSON, timestamps) and the indexes maintained (website_id + timestamp, pathname, user_signature), a single event typically consumes:

ComponentApprox. bytesNotes
Row payload~150 BStrings are stored as UTF-8; 64-byte signatures and 60-byte paths dominate
Index entries~30 BTimestamp + website composite index plus pathname index
Aggregate impact~20 BHourly site_stats, page_stats, ref_stats, etc. grow slowly because they upsert per bucket, but budgeting ~20 B/event keeps headroom
Subtotal~200 B0.00019 MB/event
Safety margin (+20%)+40 BCovers WAL spillover, VACUUM windows, filesystem metadata
Total (plan for)~240 B/event (~0.00024 MB)Multiply by your daily events to predict growth

If you emit lots of custom events (e.g., scroll-depth, section-view, or button instrumentation that fires multiple times per session), plug the higher event count into this per-event estimate. Scroll-based automations can easily double or triple the daily event volume depending on page length.

Traffic TierDaily VisitsEvents/dayPersistent Growth/day1-year Storage5-year Storage (20% YoY)
Small / Personal1,000~3,750~0.72 MB~0.26 GB~1.9 GB
Medium / Business10,000~37,500~7.2 MB~2.6 GB~19 GB
Large / High-Traffic100,000~375,000~72 MB~25.7 GB~191 GB

Add ~15% headroom on top of the yearly/long-term numbers to account for WAL files, VACUUM windows, and OS-level free space recommendations.

Because aggregates are hourly and deduplicated, storage grows linearly with time rather than exponentially with traffic spikes. If needed, archival policies can move data older than two years into separate SQLite files without touching live dashboards.

Fusionaly uses SQLite wisely: the hot events table stays lean, dashboards query compact hourly aggregates, and most queries target recent time ranges. Even a modest instance can sustain ~400 req/sec during load tests—enough to ingest hundreds of thousands of events per day while keeping charts responsive. Because aggregates compress historical data and the write path never blocks readers, this architecture comfortably handles millions of daily events for most single-tenant deployments.