> * 5-15ms downtime to backup a live sqlite db with a realistic amount of data for a crud db
Did you consider using a filesystem with atomic snapshots? For example sqlite with WAL on BTRFS. As far as I can tell, this should have a decent mechanical sympathy.
edit: I didn't really explain myself. This is for zero downtime backups. Snapshot, backup at your own pace, delete the snapshot.
If it’s at 5-15ms of downtime already, you’re in the space where the “zero” downtime FS might actually cause more downtime. In addition to pauses while the snapshot is taken, you’d need to carefully measure things like performance degradation while the snapshot exists (incurring COW costs) and while it’s being GCed in the background.
Also, the last time I checked the Linux scheduling quanta was about 10ms, so it’s not clear backups are going to even be the maximum duration downtime while the system is healthy.
I am not so sure you know what you are talking about. Feel free to provide some reading material for my education.
Why would the scheduler tick frequency even matters for this discussion. Even on a single cpu/core/thread system. For what is worth, the default scheduler tick rate has been 2.5ms since 2005. Earlier this year somebody proposed switching back to 1ms.
Well, I haven’t checked it for a while. Still, try measuring FS latencies during checkpoints, or just write a tight loop program that reads cached data and prints max latencies once an hour. Use the box for other stuff while it runs.
There is no way btrfs can be slower than this in any shape or form.
If we are comparing something simpler. Like making a copy of the SQLite database periodically. It makes sense for a COW snapshot to be faster than copying the whole database. After reading the btrfs documentation, it seems reasonable to assume that the snapshot latency will stay constant, while a full copy would slow down as the single file database grows bigger.
And so it stands of reason that freezing the database during a full copy is worse than freezing it during a btrfs snapshot. And a full copy of the snapshot can then be performed, optionally with a lower IO priority for good measure.
It should be obvious that the less data is physically read or written on the hot path, the less impact there is on latency.
Did you consider using a filesystem with atomic snapshots? For example sqlite with WAL on BTRFS. As far as I can tell, this should have a decent mechanical sympathy.
edit: I didn't really explain myself. This is for zero downtime backups. Snapshot, backup at your own pace, delete the snapshot.