How does this compare to ZFS + cron to create snapshots every X minutes?

harvie · on Oct 11, 2022

Week ago my client lost data on ZFS by accidentaly deleting folder. Unfortunately the data was created and deleted in the meantime between two snapshots. One would expect that it still might be possible to recover, because ZFS is CoW.

There are some solutions like photorec (which now has ZFS support), but it expects you can identify the file by footprint of its contents, which was not the case. Also many of these solutions would require ZFS to go offline for forensic analysis and that was also not possible because lots of other clients were using the same pool at the time.

So this had failed me and i really wished at the time that ZFS had continuous snapshots.

BTW on ZFS i use ZnapZend. It's second best thing after continuous snapshots:

https://www.znapzend.org/ https://github.com/oetiker/znapzend/

There are also some ZFS snapshotting daemons in Debian, but this is much more elegant and flexible.

But since znapzend is userspace daemon (as are all ZFS snapshoters) you need some kind of monitoring and warning mechanism for cases something goes wrong and it can't longer create snapshots (crashes, gets killed by OOM or something...). In NILFS2 every write/delete is snapshot, so you are basicaly guaranteed by kernel to have everything snapshoted without having to watch it.

goodpoint · on Oct 11, 2022

There is no comparison. NILFS provides *continuous* snaphots, so you can inspect and rollback changes as needed.

It does without a performance penalty compared to other logging filesystems.

And without using additional space forever. The backlog rotates forward continuously.

It's a really unique feature that makes a lot of sense for desktop use, where you might want to recover files that were created and deleted after a short time.

harvie · on Oct 11, 2022

Perhaps we can leverage "inotify" API to make ZFS snapshot everytime some file had been changed... But i think ZFS is not really good at handling huge amounts of snapshots. The NILFS2 snapshots are probably more lightweight when compared to ZFS ones.

mustache_kimono · on Oct 11, 2022

> Perhaps we can leverage "inotify" API to make ZFS snapshot everytime some file had been changed...

ZFS and btrfs users are already living in the future:

  inotifywait -r -m --format %w%f -e close_write "/srv/downloads/" | while read -r line; do
      # command below will snapshot the dataset
      # upon which the closed file is located
      sudo httm --snap "$line" 
  done

See: https://kimono-koans.github.io/inotifywait/

harvie · on Oct 18, 2022

What is httm? I like this script as a proof of concept.

But i still can imagine failure modes, eg. inotify might start acting weird when ZFS remounts the watched directory, OOM killer terminates it without anyone noticing, bash loop go haywire when package manager updates that script (bash is running directly from the file and when it changes during execution, it might just continue running from the same byt offset in completely different script).

All these things actualy happened to me in the past. Not to say that if you have multiple datasets in ZFS you cannot inotify wait on all of them at once, so you will have to manage one bash process per dataset. And performance of bash and sudo might not be that awesome.

So for real reliability you would probably want this to actualy run in ZFS/kernel context...

mustache_kimono · on Oct 19, 2022

> What is httm? I like this script as a proof of concept.

See: https://github.com/kimono-koans/httm

> But i still can imagine failure modes, eg. inotify might start acting weird when ZFS remounts the watched directory, OOM killer terminates it without anyone noticing, bash loop go haywire when package manager updates that script (bash is running directly from the file and when it changes during execution, it might just continue running from the same byt offset in completely different script).

I mean, sure, scripts gonna script. You're gonna have to make the POC work for you. But, for instance, I'm not sure half of your issues are problems with a systemd service. I'm not sure one is a problem with a well designed script, which accounts for your particular issues, and a systemd service.

> All these things actualy happened to me in the past. Not to say that if you have multiple datasets in ZFS you cannot inotify wait on all of them at once, so you will have to manage one bash process per dataset. And performance of bash and sudo might not be that awesome.

Yes, you can?

Just re this POC, you can inotifywait a single directory, which contains multiple datasets, and httm will correctly determine and snapshot the correct one upon command. Your real bottleneck here is not sudo or bash. It's the zfs command waiting for a transaction group sync, or arranging for the trans group (or even something else, but its definitely zfs?), to snap.

You can also use `httm -m` to simply identify the dataset and use a channel program and/or a separate script to sync. sudo and bash may not have the performance for your use case, hell, they are composable with everything else?

> So for real reliability you would probably want this to actualy run in ZFS/kernel context...

Yeesh, I'm not sure? Maybe for your/a few specific use cases? Note, inotify (a kernel facility) is your other bottleneck. You're never going to want to watch more than a few/10s of thousand files. The overhead is just going to be too great.

But for most use cases (your documents folder)? Give httm and inotifywait a shot.

goodpoint · on Oct 11, 2022

The NILFS snapshots are practically free (for a logging filesystem, obviously).

fuckstick · on Oct 11, 2022

> It does without a performance penalty.

What is the basis for comparison? Sounds like a pretty meaningless statement at its face.

goodpoint · on Oct 11, 2022

Compared to other logging filesystems obviously.

fuckstick · on Oct 11, 2022

Nilfs baseline (write throughput especially) is slow as shit compared to other filesystems including f2fs. So just because you have this feature that doesn’t make it even slower isn’t that interesting - you pay for it one way or the other.

usr1106 · on Oct 11, 2022

For many users filesystem speed of your home directory is completely irrelevant unless you run on a Raspberry Pi using SD cards. You just don't notice it.

Of course if you haver server handling let's say video files things will be very different. And there are some users who process huge amounts of data.

I run 2 lvm snapshots (daily and weekly) on my home partition for years. Write performance is abysmal if you measure it, but you don't note it in daily development work.

pkulak · on Oct 11, 2022

> There is no comparison.

What if I compare it to BTRFS + Snapper? No performance penalty there, plus checksumming.

AshamedCaptain · on Oct 11, 2022

btrfs and snapperd do have a performance penalty as the number of snapshots increases. Having 100+ usually means snapper list will take north of an hour. You can easily reach these numbers if you are taking a snapshot every handful of minutes.

Even background snapper cleanups will start to take a toll, since even if they are done with ionice they tend to block simultaneous accesses to the filesystem while they are in progress. If you have your root on the same filesystem, it's not pretty -- lots of periodic system-wide freezes with the HDD LEDs non-stop blinking. I tend to limit snapshots always to < 20 for that reason (and so does the default snapperd config).

mike256 · on Oct 11, 2022

About 2 years ago I believed the same. Then I used BTRFS as a store for VM images (with periodoc snapshot) and performance went down to really really bad. After I deleted all snapshots performance was good again. There is a big performance penalty in btrfs with more than about 100 snapshots.

jinnko · on Oct 12, 2022

Did you disable CoW on the VM image files? This makes a significant difference to performance on BTRFS

1MachineElf · on Oct 11, 2022

>It's a really unique feature that makes a lot of sense for desktop us

Sounds like it could serve as a basis for a Linux implementation of something like Apple Time Machine.

mustache_kimono · on Oct 11, 2022

With 'httm', a few of us are already living in that bright future: https://github.com/kimono-koans/httm

masklinn · on Oct 11, 2022

Afaik Time Machine does not do continuous snapshots, just periodic (and triggered).

So you can already do that with zfs: take a snapshot and send it to the backup drive.

harvie · on Oct 11, 2022

"It does without a performance penalty"

yeah. it's already so terribly slow that it's unlikely that taking snapshots can make it any slower :-D

Volundr · on Oct 11, 2022

That was not my experience with NILFS. It outperformed ext4 on my laptop NVME.

akvadrako · on Oct 11, 2022

The benchmarks here look pretty bad:

https://www.phoronix.com/review/linux-58-filesystems/4

Volundr · on Oct 11, 2022

The last page looks pretty bad. If you look at the others it's more of a mixed bag, but yeah.

I don't remember what benchmark I ran before deciding to run it on my laptop. Given my work at the time probably pgbench, but I couldn't say for sure. It was long enough ago I also might've been benchmarking against ext3, not 4.

harvie · on Oct 11, 2022

i think i was running it on 6TB conventional HDD RAID1. also note that the read and write speeds might be quite asymetrical... in general also depends on workload type.

yonrg · on Oct 11, 2022

I run this setup. zfs + zfsnap (not cron anymore, now systemd.timer).

I cannot tell if NILFS is doing this too, with zfsnap I maintain different retention times. 5-minutely for 1hour, hourly for 1day, daily for a week. That are less than 60 snapshots. The older ones are cleaned up.

In addition, zfs brings compression and encryption. That's why I have it on the laptops, too.