Internal Architecture & Queueing Strategy
How Fusionaly's ingestion pipeline, SQLite stack, and aggregation layers work together to absorb bursts and stay scalable
Fusionaly doesn’t follow the traditional analytics playbook. Instead of provisioning a heavyweight cluster, the platform uses a minimalist SQLite-based datastore with deliberate buffering and backpressure. Every layer—browser SDK, HTTP ingress, WAL-backed writes, and asynchronous aggregation—leans into eventual consistency. Throughput is intentionally capped with server-side concurrency control and client-side retries, so bursts slow down briefly instead of overwhelming SQLite. The single-tenant focus keeps datasets small, trims architectural complexity, and makes short-lived lag acceptable compared to chasing multi-tenant scale. Dashboards stay fast, but they aren’t second-by-second real-time; during spikes, aggregates can lag by a minute or two while raw events stay durable.
This guide explains how that architecture works end to end and why it still handles hundreds of thousands of events per day on a single tenant installation.
Key guarantees:
- Client-side retries and local caching prevent dropped events when browsers go offline.
- The ingestion API writes straight to disk with WAL retries; no volatile buffers to lose data.
- Background jobs convert hot data into hourly aggregates, so dashboards read compact tables.
- Backpressure (busy timeouts, HTTP limiter) favors temporary delays over failed writes.
Fusionaly Storage Architecture
Section titled “Fusionaly Storage Architecture”Fusionaly makes four promises about how your analytics data is handled:
- Buffered writes with durable storage – every event hits disk immediately (no RAM-only queue) and the write path retries automatically.
- Eventually consistent dashboards – aggregates update asynchronously, so heavy bursts may delay charts for a short period, but they always catch up.
- Client-side buffering with bounded offline cache – the SDK batches events, retries with expo-backoff, stores up to 100 pending events in localStorage (oldest trimmed if offline too long), and drains them when the visitor comes back online.
- Graceful burst handling – when processing lags, the system prefers small queues and delayed aggregates over dropped events, keeping your data stable.
Our storage layer uses SQLite under the hood, but you can think of it as a purpose-built analytics datastore:
Storage Safeguards
Section titled “Storage Safeguards”The datastore keeps writes smooth by combining WAL mode with busy_timeout = 5000, _txlock=immediate, and a fixed connection pool (10 open / 5 idle by default). On top of that, the PerformWrite helper retries each transaction up to 10 times with exponential backoff, so transient locks never bubble up to HTTP clients.
Built-in Queueing Layers
Section titled “Built-in Queueing Layers”The storage layer sees well-behaved traffic because:
HTTP request -> GORM connection pool -> SQLite WAL queue v v (10 pooled conns) (busy_timeout + retries)- Layer 1 (HTTP handler): public ingestion endpoints run behind a Fiber-based concurrency limiter that returns
503+Retry-After: 1when the queue is full. - Layer 2 (Connection pool): goroutines wait for a pooled connection instead of hammering the database.
- Layer 3 (WAL queue): the storage layer serializes writes internally while allowing unlimited concurrent readers. PerformWrite’s retries ensure the goroutine eventually commits once its turn arrives.
Event Processing & Queueing
Section titled “Event Processing & Queueing”Fusionaly’s ingestion path is intentionally layered so bursts never overwhelm SQLite:
Server Dataflow at a Glance
Section titled “Server Dataflow at a Glance”Here’s the bird’s-eye view of how a single-tenant Fusionaly instance moves data. Think of it as a pragmatic pipeline: hot writes stay durable, cold aggregates settle asynchronously, and every junction has built-in backpressure.
Browser SDK - 200 ms batches - Exponential retries - Local cache (100 events) || \//x/api/v1/events (HTTP ingress) -> Concurrency limiter (returns 503 + Retry-After when saturated) -> Durable write (PerformWrite with WAL + busy_timeout + retries) || \/ingested_events (hot queue table) || \/ Jobs.ProcessEvents (100-row batches, WAL transactions)events table (authoritative log) --------> Admin APIs needing exact data || \/ UpdateAllAggregatesBatchsite_stats / page_stats / device_stats / ... (hourly aggregates) || \/ Dashboards & Reports (eventually consistent)This architectural choice embraces SQLite’s single-writer model instead of fighting it. Buffers are explicit (browser-side batching + localStorage cache + ingested_events table), backpressure is intentional (client retries, busy timeouts, server retries, HTTP limiter), and aggregates are treated as cold data that can lag briefly without threatening correctness. Few analytics stacks run on SQLite, but constraining the problem to a single tenant mitigates the usual pain points: WAL ensures durability, the hot tables remain lightweight, and hourly aggregates provide compact read models for dashboards.
End-to-End Pipeline
Section titled “End-to-End Pipeline”- Browser SDK buffer - events are queued locally (200 ms send interval, batch size <= 10) and fall back to a 100-event
localStoragecache when offline. - Public ingestion API -
/x/api/v1/eventsvalidates payloads, normalizes URLs, filters excluded IPs, and persists the raw record into theingested_eventstable. - Background processor -
jobs.ProcessEventswakes up every 60 s (configurable) and drains pending rows in batches of 100, converting them into canonicalevents. - Aggregate upserts - the same batch data feeds
UpdateAllAggregatesBatch, which issuesINSERT ... ON CONFLICTstatements into hourly tables such assite_stats,page_stats,ref_stats, device/browser/OS stats, UTM stats, and customevent_stats.
Browser SDK -> HTTPS ingestion -> ingested_events queue -> batch normalization -> aggregate upsertsNo row is dropped during the transition between stages; every failure triggers a retry before the job advances to the next batch.
The only buffers in this flow are the browser’s batching layer (<= 10 events every 200 ms, plus a 100-event localStorage fallback) and the durable ingested_events table. Because both layers retry with exponential backoff, outright failures are extremely rare; the trade-off is short-term eventual consistency, where aggregates can lag by a few seconds or minutes during massive bursts but always catch up once the backlog drains.
Client-Side Buffering & Retry Logic
Section titled “Client-Side Buffering & Retry Logic”- Event batching: events are sent in batches every 200 ms (up to 10 events per batch).
- Retry with backoff: failed sends are retried automatically with increasing delays.
- Offline cache: persistent localStorage cache (capped at 100 events) prevents data loss when offline.
- Unload handling:
navigator.sendBeaconensures events are sent before tab closes.
Server-Side Durability
Section titled “Server-Side Durability”- Ingest-first strategy: incoming events land in
ingested_eventsviaPerformWrite, then a background job migrates them toeventsand aggregates, so HTTP responses stay fast. - SQLite-native retries:
busy_timeout = 5000plus up to 10 Go-level retries preventdatabase is lockedleaks.
Aggregation Queue
Section titled “Aggregation Queue”- Batch-oriented: The background job reads
processed = 0 ORDER BY created_at, so the oldest events leave the queue first and related batches stay together. - Hourly buckets: every aggregate table (site/page/referrer/device/browser/OS/UTM/custom event) stores one row per
(website_id, hour, dimension)pair. Queries spanning years read at most 24 rows per day per dimension, so multi-year charts stay fast even as raw events grow. - Idempotent upserts: hourly rows use
INSERT ... ON CONFLICTincrements, so replaying a batch only bumps counters. - Session-aware enrichment: derived flags (new visitor, session boundary, bounce, entrances, exits, UTM) are computed once during aggregation, keeping dashboards off the hot
eventstable.
Hot vs. Cold Data
Section titled “Hot vs. Cold Data”- Hot path:
ingested_eventsandeventstables hold the authoritative raw data and are updated synchronously with each HTTP request. Anything that needs point-in-time accuracy (e.g., Visitor Transparency) reads from here. - Cold path: hourly aggregate tables (
site_stats,page_stats,ref_stats, device/browser/OS, UTM, customevent_stats) are filled asynchronously. They are eventually consistent—during a burst they may lag behind the hot tables—but once the queue drains, they reflect the full history. - Query cost: because cold tables are hourly buckets, even a multi-year range is only ~24 rows per day per dimension, so reporting and dashboards stay cheap regardless of total event count.
Worst-Case Burst Handling
Section titled “Worst-Case Burst Handling”During a spike (for example, 1,000 events/second for 30 seconds):
- Browsers coalesce requests and retry locally, so only ~100 HTTP requests per second hit the server despite higher user activity.
- Ingestion API accepts every payload into
ingested_eventsand keeps the HTTP handlers fast: responses go out once the write queue acknowledges the row, not when aggregates finish. - Background processor drains the queue in 100-row chunks. If processing takes longer than the default 60-second cadence, the job simply keeps looping until
processed=0disappears; no human intervention is required. - Aggregates briefly lag but remain ordered. Dashboards querying the hourly tables pick up the new data as soon as each batch commits, and because the queue is FIFO, you never see recent hours without their earlier events.
Performance Characteristics
Section titled “Performance Characteristics”Fast Reads
Section titled “Fast Reads”SQLite excels at read operations, which is perfect for analytics dashboards:
- Query Performance: Most dashboard queries execute in < 10ms
- Indexes: Optimized indexes on timestamp, page_url, and visitor_id fields
- Read-ahead Caching: Frequently accessed data is kept in memory
- Connection Pooling: Multiple read connections can operate simultaneously
Example dashboard query (runs against site_stats and finishes in single-digit milliseconds on a modest VPS):
SELECT strftime('%Y-%m-%d', hour) AS day, COALESCE(SUM(visitors), 0) AS visitorsFROM site_statsWHERE website_id = :website_id AND hour BETWEEN datetime('now', '-30 days') AND datetime('now')GROUP BY dayORDER BY day ASC;Behavior Under Intense Writes
Section titled “Behavior Under Intense Writes”SQLite handles high write loads gracefully with proper configuration:
Write Performance Optimizations:
- WAL Mode: Write-Ahead Logging allows concurrent reads during writes
- Batch Transactions: Background processors commit ~100 normalized events per transaction, so each WAL sync amortizes the cost of multiple writes.
- Asynchronous Jobs: Non-critical aggregations run in background jobs so ingestion stays lean
- Client + Server Retries: The browser SDK retries failed posts, and
PerformWriteretries SQLite transactions up to 10 times, so there’s no need for an in-memory write buffer
Concurrent Write Handling:
Peak Traffic Scenario (1000 events/second):|-- Browser SDK (client batching + retry)|-- Ingested Events table (durable queue)|-- Batch processor (100 events/transaction)|-- WAL Journal (enables concurrent reads)\-- Database File (final persistent storage)Under extreme load (> 2000 events/second), Fusionaly:
- Keeps the batch processor running fixed 100-row commits so transactions stay short and predictable
- Lets the browser SDK +
PerformWriteretry loops absorb transient contention rather than buffering writes in RAM
Behavior Under Intense Reads
Section titled “Behavior Under Intense Reads”Analytics dashboards often generate many simultaneous read queries:
Read Optimization Strategies:
- Connection Pooling: Up to 20 concurrent read connections
- Query Result Caching: Frequent queries are cached for 30-60 seconds
- Read Replicas: Heavy read workloads can use read-only database copies
- Prepared Statements: Common queries are pre-compiled for faster execution
Real-world Read Performance:
- Dashboard loading: 50-100 queries in < 200ms total
- Real-time updates: Background queries every 10 seconds
- Report generation: Complex queries complete in < 2 seconds
Database Size & Storage Requirements
Section titled “Database Size & Storage Requirements”Maximum Database Size
Section titled “Maximum Database Size”SQLite theoretical limits are enormous:
- Maximum database size: 281 TB (281,474,976,710,656 bytes)
- Maximum row count: 2^64 rows
- Maximum file size: Limited by filesystem (typically 8 TB+ on modern systems)
Realistic Size Expectations
Section titled “Realistic Size Expectations”To gauge how storage scales with traffic, assume each visit yields ~3.75 events (2.5 page views plus supporting clicks/custom events) and that aggregates add ~12% on top of the raw events file.
Per-Event Storage Estimate
Section titled “Per-Event Storage Estimate”Looking at the events schema (website ID, visitor signature, hostname, pathname, referrer, event type, metadata JSON, timestamps) and the indexes maintained (website_id + timestamp, pathname, user_signature), a single event typically consumes:
| Component | Approx. bytes | Notes |
|---|---|---|
| Row payload | ~150 B | Strings are stored as UTF-8; 64-byte signatures and 60-byte paths dominate |
| Index entries | ~30 B | Timestamp + website composite index plus pathname index |
| Aggregate impact | ~20 B | Hourly site_stats, page_stats, ref_stats, etc. grow slowly because they upsert per bucket, but budgeting ~20 B/event keeps headroom |
| Subtotal | ~200 B | 0.00019 MB/event |
| Safety margin (+20%) | +40 B | Covers WAL spillover, VACUUM windows, filesystem metadata |
| Total (plan for) | ~240 B/event (~0.00024 MB) | Multiply by your daily events to predict growth |
If you emit lots of custom events (e.g., scroll-depth, section-view, or button instrumentation that fires multiple times per session), plug the higher event count into this per-event estimate. Scroll-based automations can easily double or triple the daily event volume depending on page length.
Storage Comparison by Traffic Tier
Section titled “Storage Comparison by Traffic Tier”| Traffic Tier | Daily Visits | Events/day | Persistent Growth/day | 1-year Storage | 5-year Storage (20% YoY) |
|---|---|---|---|---|---|
| Small / Personal | 1,000 | ~3,750 | ~0.72 MB | ~0.26 GB | ~1.9 GB |
| Medium / Business | 10,000 | ~37,500 | ~7.2 MB | ~2.6 GB | ~19 GB |
| Large / High-Traffic | 100,000 | ~375,000 | ~72 MB | ~25.7 GB | ~191 GB |
Add ~15% headroom on top of the yearly/long-term numbers to account for WAL files, VACUUM windows, and OS-level free space recommendations.
Because aggregates are hourly and deduplicated, storage grows linearly with time rather than exponentially with traffic spikes. If needed, archival policies can move data older than two years into separate SQLite files without touching live dashboards.
Conclusion
Section titled “Conclusion”Fusionaly uses SQLite wisely: the hot events table stays lean, dashboards query compact hourly aggregates, and most queries target recent time ranges. Even a modest instance can sustain ~400 req/sec during load tests—enough to ingest hundreds of thousands of events per day while keeping charts responsive. Because aggregates compress historical data and the write path never blocks readers, this architecture comfortably handles millions of daily events for most single-tenant deployments.