Managing File Locks on FAT32 vs ext4

You provisioned a SQLite database on a FAT32 SD card so the same flash image boots on Windows tooling, a Linux gateway, and a bare-metal recovery loader — and now a second process opening that database hangs indefinitely or trips SQLITE_BUSY on the very first write. FAT32 has no POSIX advisory locking (fcntl/flock), the exact primitive SQLite relies on to coordinate a single writer against concurrent readers, so the engine silently falls back to fragile dot-file locking and Write-Ahead Log (WAL) coordination breaks. This page fixes that one scenario: detecting the filesystem at connection time and applying the correct locking strategy for FAT32 versus ext4. It sits under the File System Permissions & Ownership cluster of the broader SQLite Architecture & Production Hardening guidance, and assumes you already understand the difference between the journaling modes SQLite offers.

Diagnosis

Confirm you are actually hitting the FAT32 locking fallback before changing anything — the symptoms overlap with ordinary lock contention, and the fix differs entirely.

Identify the filesystem under the database. Never trust the mount path or the block size; ext4 and FAT32 both commonly report a 4,096-byte block, so statvfs heuristics cannot tell them apart. Read /proc/mounts for the longest mount point that is a prefix of the database’s real path:

# Resolve symlinks, then find the fstype of the mount that owns the file.
df --output=fstype /path/to/sensor.db | tail -1     # quick check: vfat vs ext4
grep -w vfat /proc/mounts                            # confirms a FAT32/vfat mount exists

If fstype comes back as vfat, fat, or msdos, you are on FAT32 and the advisory locks SQLite expects do not exist.

Match the error signature. The FAT32 fallback produces a distinctive cluster of symptoms rather than a single clean error:

Persistent SQLITE_BUSY (error code 5) or SQLITE_BUSY_SNAPSHOT from a second connection even when no writer is genuinely active — a stale .lock dot-file left by a previous process is being read as a live writer.
SQLITE_IOERR variants (code 10), specifically SQLITE_IOERR_SHMOPEN / SQLITE_IOERR_SHMMAP, the moment you run PRAGMA journal_mode=WAL; — the -shm shared-memory index needs byte-range locks that vfat cannot provide.
Orphaned -wal, -shm, or -journal sidecar files that survive a power cycle and are never cleaned up on the next open.

Reproduce the WAL failure deterministically. On a FAT32 mount, this single statement is enough to expose the problem — WAL will either refuse to engage or leave an unusable -shm file:

import sqlite3
conn = sqlite3.connect("/mnt/sdcard/sensor.db")
mode = conn.execute("PRAGMA journal_mode=WAL;").fetchone()[0]
print(mode)   # On healthy ext4 -> 'wal'. On FAT32 you often get 'delete'
              # back (silent refusal) or an SQLITE_IOERR_SHMMAP is raised.

If WAL silently reports back delete, or raises an I/O error on the -shm file, the fallback is confirmed. Note that a stale root-owned .lock or sidecar file can also block the fallback — verify ownership against the File System Permissions & Ownership rules before assuming the filesystem alone is at fault.

Solution

Detect the filesystem before the first PRAGMA, then branch: keep WAL and standard advisory locking on ext4, but force EXCLUSIVE locking with a DELETE rollback journal on FAT32. Exclusive locking makes the connecting process take sole ownership for its lifetime, which sidesteps the dot-file race entirely, while the single contiguous rollback journal survives power loss more gracefully than fragmented WAL frames when no POSIX locks are available.

import os
import sqlite3
import logging
from contextlib import contextmanager

logger = logging.getLogger(__name__)


def detect_fat32(path: str) -> bool:
    """Return True if `path` lives on a vfat/FAT32 mount.

    Reads /proc/mounts and picks the longest mount point that prefixes the
    resolved path. Block-size heuristics are useless here (ext4 and FAT32
    both report 4096), so we inspect the fstype string directly.
    Fails closed to False so the safer ext4/WAL path is taken on any error.
    """
    try:
        abs_path = os.path.realpath(path)
        best_mount, best_fstype = "", ""
        with open("/proc/mounts") as fh:
            for line in fh:
                parts = line.split()
                if len(parts) < 3:
                    continue
                mount_point, fstype = parts[1], parts[2]
                if abs_path.startswith(mount_point) and len(mount_point) > len(best_mount):
                    best_mount, best_fstype = mount_point, fstype
        return best_fstype in ("vfat", "fat", "msdos")
    except OSError:
        logger.warning("Cannot read /proc/mounts for %s; assuming ext4", path)
        return False


@contextmanager
def locked_connection(db_path: str):
    is_fat32 = detect_fat32(db_path)
    conn = sqlite3.connect(
        db_path,
        timeout=30.0,          # seconds spent auto-retrying SQLITE_BUSY before raising
        isolation_level=None,  # explicit BEGIN/COMMIT; no hidden autocommit transactions
        check_same_thread=False,
    )
    try:
        conn.execute("PRAGMA foreign_keys = ON;")   # enforce referential integrity

        if is_fat32:
            # FAT32 has no byte-range locks, so WAL's -shm coordination cannot work.
            conn.execute("PRAGMA locking_mode = EXCLUSIVE;")  # sole owner; kills the .lock dot-file race
            conn.execute("PRAGMA journal_mode = DELETE;")     # single rollback journal, not WAL frames
            conn.execute("PRAGMA synchronous = FULL;")        # fsync every commit; no WAL crash-safety to lean on
            logger.info("FAT32 detected: EXCLUSIVE + DELETE + synchronous=FULL for %s", db_path)
        else:
            # ext4/XFS/APFS: real advisory locks, so WAL gives concurrent readers + one writer.
            conn.execute("PRAGMA journal_mode = WAL;")        # readers never block the single writer
            conn.execute("PRAGMA locking_mode = NORMAL;")     # release locks between transactions
            conn.execute("PRAGMA synchronous = NORMAL;")      # fsync at checkpoint, not every commit
            conn.execute("PRAGMA wal_autocheckpoint = 1000;") # checkpoint every 1000 pages (~4MB)
            logger.info("ext4 detected: WAL + NORMAL locking for %s", db_path)

        yield conn
    finally:
        conn.close()

On FAT32 the EXCLUSIVE lock trades concurrency for correctness, matching the integrity-first stance of the fallback routing strategies used elsewhere in constrained deployments. If your workload genuinely needs multiple processes on the same FAT32 volume, that is not a PRAGMA problem — copy the database to an ext4 tmpfs staging area, or serialize all writes through a single owning process.

Verification

Every hardening step above is only real if the engine actually accepted it. Read the PRAGMAs back and assert, because SQLite silently ignores some settings rather than erroring — journal_mode in particular reports the mode it actually selected.

def assert_lock_config(db_path: str) -> None:
    is_fat32 = detect_fat32(db_path)
    with locked_connection(db_path) as conn:
        journal = conn.execute("PRAGMA journal_mode;").fetchone()[0].lower()
        locking = conn.execute("PRAGMA locking_mode;").fetchone()[0].lower()
        sync = conn.execute("PRAGMA synchronous;").fetchone()[0]  # 2 == FULL, 1 == NORMAL

        if is_fat32:
            assert journal == "delete", f"FAT32 must not run WAL, got {journal!r}"
            assert locking == "exclusive", f"FAT32 needs EXCLUSIVE, got {locking!r}"
            assert sync == 2, f"FAT32 needs synchronous=FULL (2), got {sync}"
        else:
            assert journal == "wal", f"ext4 should run WAL, got {journal!r}"

    print("Lock configuration verified for", "FAT32" if is_fat32 else "ext4")

Two out-of-band checks close the loop. First, confirm no orphaned sidecar files linger after a clean close — on FAT32 with DELETE journaling there must be no -wal or -shm next to the database:

ls -la /mnt/sdcard/sensor.db*   # expect only sensor.db; a stray -shm/-wal signals a crashed writer

Second, prove that a stale lock no longer wedges a fresh open: kill a writer mid-transaction, then reopen. With EXCLUSIVE + DELETE the new process should acquire the database within the busy_timeout window instead of blocking forever. If it still hangs, tune the retry envelope per the busy_timeout configuration guidance rather than raising the exclusive-lock timeout blindly.

Failure Modes & Gotchas

A connection-pool layer re-opens without the FAT32 branch. If you front SQLite with a pool, each pooled handle must run detect_fat32() and the same PRAGMA branch, or the first pool-created connection quietly reverts to WAL and re-introduces the -shm failure. This is the most common regression when retrofitting the fix — see connection pooling strategies for wiring an init hook that runs on every checkout, not just the first.

SD-card write reordering defeats synchronous=FULL. Many consumer SD and USB flash controllers buffer and reorder writes internally and lie about completing an fsync(). On FAT32 the DELETE journal is more power-safe than WAL, but the controller can still corrupt the rollback journal on a brownout. Batch writes into explicit transactions committed at deterministic intervals, keep the schema design narrow with idempotent upserts, and treat industrial-grade (SLC/pSLC) media with power-loss protection as a hardware requirement, not an optimization.

Memory-mapped I/O silently disables itself. WAL and the -shm index depend on shared memory that FAT32 cannot back, so any PRAGMA mmap_size you set on a FAT32 database is effectively dead — the engine cannot map the coordination file. Do not carry memory-mapped I/O configuration tuned for ext4 across to a FAT32 target and assume it applies; verify with the readback above.

Unbounded lock waits become a denial-of-service vector. A timeout=30.0 that never gives up will pin a worker forever behind a stale lock. Cap retries, log every lock-acquisition timestamp to find contention hotspots, and fail the request rather than block indefinitely — the same principle the security boundaries and access control practices apply to any unbounded resource wait.

File System Permissions & Ownership — the parent guide covering lock-file ownership, umask, and VFS fallback chains.
Journaling Modes Deep Dive — why WAL, DELETE, and TRUNCATE behave differently under power loss.
Connection Pooling Strategies — applying per-connection PRAGMA hooks so pooled handles keep the FAT32 branch.
Fallback Routing Strategies — degrading safely when a database on removable media becomes unavailable.

Managing File Locks on FAT32 vs ext4 #

Diagnosis #

Solution #

Verification #

Failure Modes & Gotchas #

Related Pages #

Managing File Locks on FAT32 vs ext4

Diagnosis

Solution

Verification

Failure Modes & Gotchas

Related Pages