Tuning `cache_size` for Embedded Linux

On a memory-constrained ARM Linux target — a 256 MB industrial gateway, an edge telemetry daemon capped by a cgroup v2 memory limit, a Python automation service sharing a SoC with the rest of the platform — SQLite’s default cache_size of -2000 (roughly 2 MB of page cache) is almost never the right number, in either direction. Leave it at the default under a bursty write load and the pager evicts hot pages constantly, forcing synchronous reloads and starving the WAL checkpoint of the dirty-page headroom it needs. Overcorrect to -65536 (64 MB) on a device with a hard container ceiling and the kernel OOM-reaper kills the process the first time real memory pressure arrives. This page addresses that exact calibration problem — choosing a page-cache size that respects the container’s memory boundary while absorbing write bursts — as one step within Memory-Mapped I/O Configuration, part of the broader WAL Optimization & Concurrency Tuning discipline. It assumes the connection is already in Write-Ahead Logging mode with the shared PRAGMA baselines applied; here we specialize just the cache for scarce RAM.

Diagnosis

Confirm you have a cache-sizing problem, and which side of it, before touching the value. Two signals distinguish “too small” (eviction thrash) from “too large” (memory-ceiling pressure), and they are read from different places.

First, read what the engine actually holds. PRAGMA cache_size returns the configured size — negative means KiB, positive means a raw page count — and PRAGMA cache_spill reveals whether the pager is being forced to flush dirty pages to disk mid-transaction because the cache filled:

PRAGMA cache_size;   -- negative = KiB (e.g. -2000 = ~2 MiB); positive = page count
PRAGMA page_size;    -- usually 4096; needed to convert a page count into bytes
PRAGMA cache_spill;  -- nonzero = pager is spilling dirty pages before commit == cache too small

An undersized cache shows up as constant spilling and, under concurrent access, as SQLITE_BUSY returns when a reader stalls a checkpoint that cannot find clean pages to evict. Watch the process from the outside to confirm the write amplification: VmRSS in /proc/<pid>/status stays flat and low while iostat shows disk writes far exceeding the row volume you are inserting — the pager is re-reading and re-flushing the same hot pages.

The opposite failure is quieter until it is fatal. On a cgroup v2 host, the ceiling that matters is /sys/fs/cgroup/memory.max, not total system RAM. Compare it against the resident set:

import os

with open("/sys/fs/cgroup/memory.max") as f:
    ceiling = f.read().strip()          # "max" means no explicit limit
with open("/sys/fs/cgroup/memory.current") as f:
    used = int(f.read().strip())        # bytes charged to this cgroup right now
print(f"cgroup ceiling={ceiling} current={used/1024/1024:.0f} MiB")

If memory.current tracks close to memory.max and climbs whenever the cache warms, an oversized cache_size is the culprit: SQLite’s pager reserves that RAM per connection, and the reservation is charged to the cgroup. When the process dies with no application-level error and dmesg shows Out of memory: Killed process, the cache — multiplied across every pooled connection — pushed the cgroup over its limit.

Solution

Size the cache from the container’s memory ceiling, not from a guessed constant, and always express it in negative KiB so the value is independent of page_size. The routine below reads the cgroup v2 limit, allocates a bounded fraction of it, applies the PRAGMA, and — critically — reads the value back to confirm the build honored it:

import os
import sqlite3
import logging
from typing import Optional

logger = logging.getLogger("sqlite_cache_tuner")


def read_cgroup_memory_limit() -> Optional[int]:
    """Effective cgroup v2 memory ceiling in bytes, or None if uncapped."""
    try:
        with open("/sys/fs/cgroup/memory.max") as f:
            value = f.read().strip()
        return None if value == "max" else int(value)
    except (OSError, ValueError):
        return None


def safe_cache_kb(ceiling_bytes: Optional[int], fraction: float = 0.18,
                  cap_mib: int = 32, floor_mib: int = 4) -> int:
    """Cache budget as negative KiB: ~18% of the ceiling, clamped [floor, cap]."""
    if ceiling_bytes is None:            # no cgroup limit -> use a conservative fixed budget
        return -(floor_mib * 1024)
    target_kib = int(ceiling_bytes * fraction) // 1024
    clamped = max(floor_mib * 1024, min(target_kib, cap_mib * 1024))
    return -clamped                      # negative => KiB, so it ignores page_size changes


def init_embedded_connection(db_path: str, timeout: float = 15.0) -> sqlite3.Connection:
    cache_kb = safe_cache_kb(read_cgroup_memory_limit())

    # isolation_level=None -> autocommit, so each PRAGMA runs immediately rather than
    # being deferred inside an implicit BEGIN that could discard it.
    conn = sqlite3.connect(db_path, timeout=timeout, isolation_level=None)
    try:
        conn.execute("PRAGMA journal_mode=WAL;")       # concurrent readers; baseline for edge writes
        conn.execute("PRAGMA synchronous=NORMAL;")     # fsync at checkpoint, not every commit
        conn.execute(f"PRAGMA cache_size={cache_kb};") # negative = KiB budget sized to the cgroup
        conn.execute("PRAGMA mmap_size=0;")            # no mmap: keep all RAM use inside cache_size,
                                                       # so the cgroup accounts for every page

        # Verify the build actually applied the request. A static SQLITE_DEFAULT_CACHE_SIZE
        # compiled into the binary can silently override PRAGMA cache_size.
        applied = conn.execute("PRAGMA cache_size;").fetchone()[0]
        if applied != cache_kb:
            logger.warning("cache_size mismatch: requested=%d applied=%d", cache_kb, applied)
        return conn
    except sqlite3.Error:
        conn.close()
        logger.exception("cache tuning failed for %s", db_path)
        raise

Three choices make this hold on constrained hardware. The 0.18 fraction leaves the bulk of the ceiling for the Python heap, the WAL, and the OS page cache; on a 256 MB cgroup that yields roughly a 45 MB target, clamped down to the 32 MB cap so a single connection can never claim the whole budget. Negative KiB sizing means the number survives a page_size of 4096 or 8192 without re-computation. And mmap_size=0 is deliberate here — mapping the file would add a second, separately-accounted pool of resident pages on top of the cache, so on a tight cgroup you keep every byte inside the one budget you can reason about (the trade-off is covered in the parent Memory-Mapped I/O Configuration guide).

Pair the cache with a checkpoint cadence, because the cache size decides how many dirty pages accumulate before a checkpoint must run. On a write-heavy edge target, raise the autocheckpoint trigger so bursts land in the cache instead of forcing a flush on every threshold crossing — calibrated in Optimizing wal_autocheckpoint for Continuous Logging.

Verification

Three checks, cheapest first.

First, assert the engine took the number you asked for, not a compiled-in default:

conn = init_embedded_connection("/var/lib/telemetry/sensors.db")
applied = conn.execute("PRAGMA cache_size;").fetchone()[0]
assert applied < 0, f"cache_size is a page count, not KiB: {applied}"
assert applied == safe_cache_kb(read_cgroup_memory_limit()), "build overrode the request"

A positive value here means a pool or ORM reset the connection to a page-count default; a negative value that differs from the request means the binary carries a static SQLITE_DEFAULT_CACHE_SIZE.

Second, confirm the cache is actually large enough to stop mid-transaction spilling under your workload. Run a representative insert batch and check the pager is not being forced to flush early:

conn.execute("BEGIN;")
for row in sample_batch:                       # a realistic burst, not one row
    conn.execute("INSERT INTO readings(sensor, value) VALUES (?, ?);", row)
spilled = conn.execute("PRAGMA cache_spill;").fetchone()[0]
conn.execute("COMMIT;")
assert spilled == 0, f"cache too small: pager spilled {spilled} dirty pages mid-transaction"

Third, prove it stays under the ceiling. Sample the cgroup while the cache warms and assert resident memory holds below the limit with headroom for the rest of the process:

import time

for _ in range(20):
    conn.execute("SELECT count(*) FROM readings;")     # touch pages to warm the cache
    with open("/sys/fs/cgroup/memory.current") as f:
        current = int(f.read().strip())
    with open("/sys/fs/cgroup/memory.max") as f:
        ceiling = f.read().strip()
    if ceiling != "max":
        assert current < int(ceiling) * 0.85, f"cgroup at {current} of {ceiling} — cache too large"
    time.sleep(0.1)

If cache_spill stays at zero and memory.current plateaus comfortably below the ceiling, the size is right. Spilling with headroom to spare means raise the cache; a climbing memory.current means lower it.

Failure Modes & Gotchas

A compiled-in SQLITE_DEFAULT_CACHE_SIZE silently overrides your PRAGMA. Many embedded distributions ship a SQLite built with a hard-coded default cache — set at compile time — that behaves unexpectedly when you also set cache_size at runtime, or that reports back a value you never requested. This is why the read-back assertion above is mandatory rather than cosmetic: if PRAGMA cache_size returns anything other than your negative-KiB request, the binary is not honoring it, and you must either rebuild with standard defaults or fall back to explicit positive page-count sizing computed against the reported page_size. Never assume the write succeeded.

Pooled and per-thread connections multiply the budget you thought you set. cache_size is per-connection, so a pool of ten workers each holding a 32 MB cache reserves ~320 MB before Python object overhead — far past a 256 MB cgroup, and the kernel reaps the process with no SQLite-level error. Two disciplines prevent it: route every handle through one initializer so the caps are identical and always applied, and size the per-connection cache against memory.max / pool_size, not against the whole ceiling. Apply the same bounded-lifetime rules from Connection Pooling Strategies, and where handles are recycled asynchronously, the per-task connection factory in Async Execution Patterns keeps each worker’s cache accounted for. A recycled handle that never ran the initializer also silently reverts to the default — the read-back assertion is your only guard.

mmap and a large cache double-buffer the same pages. If you enable memory-mapped I/O and size a generous cache_size, hot pages can be resident twice — once in the mapping, once in the pager cache — and both are charged to the cgroup, doubling your real footprint against the ceiling. On a constrained target pick one budget: keep mmap_size=0 and put the RAM into cache_size (the approach here), or map the file and shrink the cache accordingly. Sizing them independently is how a device that passed testing gets OOM-killed in the field once the working set grows.

Memory-Mapped I/O Configuration — the parent guide: how the OS mapping and the internal page cache share (and compete for) RAM.
Optimizing wal_autocheckpoint for Continuous Logging — set the checkpoint cadence that the cache size feeds into.
Threshold Tuning for High-Write Workloads — size the cache, WAL ceiling, and autocheckpoint trigger together for sustained writes.
Connection Pooling Strategies — divide the memory budget across handles so pooled caches never breach the cgroup.

Tuning cache_size for Embedded Linux #

Diagnosis #

Solution #

Verification #

Failure Modes & Gotchas #

Related Pages #

Tuning `cache_size` for Embedded Linux

Diagnosis

Solution

Verification

Failure Modes & Gotchas

Related Pages