From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 5A1BD158004 for ; Sun, 4 Feb 2024 09:59:30 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id E8AA92BC031; Sun, 4 Feb 2024 09:59:25 +0000 (UTC) Received: from smtp.hosts.co.uk (smtp.hosts.co.uk [85.233.160.19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 6A3E72BC015 for ; Sun, 4 Feb 2024 09:59:25 +0000 (UTC) Received: from host86-152-228-249.range86-152.btcentralplus.com ([86.152.228.249] helo=[192.168.1.99]) by smtp.hosts.co.uk with esmtpa (Exim) (envelope-from ) id 1rWZHX-000000000jY-9WdF for gentoo-user@lists.gentoo.org; Sun, 04 Feb 2024 09:59:24 +0000 Message-ID: <17a2d820-4745-405d-844a-09e27184e56a@youngman.org.uk> Date: Sun, 4 Feb 2024 09:59:22 +0000 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [gentoo-user] Re: Suggestions for backup scheme? To: gentoo-user@lists.gentoo.org References: <8f5371a5-07af-456e-8517-cb9bb664fac4@youngman.org.uk> Content-Language: en-GB From: Wols Lists In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Archives-Salt: 151f3697-c0fa-44d2-a336-1e3481a21437 X-Archives-Hash: 99af8811f580fe4e32f5aaf5403b02d7 On 04/02/2024 06:24, Grant Edwards wrote: > On 2024-02-03, Wol wrote: >> On 03/02/2024 16:02, Grant Edwards wrote: >>> rsnapshot is an application that uses rsync to do >>> hourly/daily/weekly/monthly (user-configurable) backups of selected >>> directory trees. It's done using rsync to create snapshots. They are >>> in-effect "incremental" backups, because the snapshots themselves are >>> effectively "copy-on-write" via clever use of hard-links by rsync. A >>> year's worth of backups for me is 7 daily + 4 weekly + 12 monthly >>> snapshots for a total of 23 snapshots. If nothing has changed during >>> the year, those 23 snapshots take up the same amount of space as a >>> single snapshot. >> >> So as I understand it, it looks like you first do a "cp with hardlinks" >> creating a complete new directory structure, but all the files are >> hardlinks so you're not using THAT MUCH space for your new image? > > No, the first snaphost is a complete copy of all files. The snapshots > are on a different disk, in a different filesystem, and they're just > plain directory trees that you can brose with normal filesystem > tools. It's not possible to hard-link between the "live" filesystem > and the backup snapshots. The hard-links are to inodes "shared" > between different snapshot directory trees. The first snapshot copies > everything to the backup drive (using rsync). Yes I get that. You create a new partition and copy all your files into it. I create a new pv (physical volume), lv (logical volume), and copy all my files into it. > > The next snapshot creates a second directory tree with all unchanged > files hard-linked to the files that were copied as part of the first > snapshot. Any changed files just-plain-copied into the second snapshot > directory tree. You create a complete new directory structure, which uses at least one block per directory. You can't hard link directories. I create a LVM snapshot. Dunno how much that is - a couple of blocks? You copy all the files that have changed, leaving the old copy in the old tree and the new copy in the new tree - for a 10MB file that's changed, you use 10MB. I use rsync's "Overwrite in place" mode, so if I change 10 bytes at the end of that 10MB file I use ONE block to overwrite it (unless sod strikes). The old block is left in the old volume, the new block is left in the new volume. > > The third snapshot does the same thing (starting with the second > snapshot directory tree). So you end up with multiple directory trees (which could be large in themselves), and multiple copies of files that have changed. Which could be huge files. I end up with ONE copy of my current data, and a whole bunch of dated mount points, each of which is a full copy as of that date, but only actually uses enough space to store a diff of the volume - if I change that 10MB file every backup, but only change lets say 10KB over three 4KB disk blocks, I've only used four blocks - 16KB - per backup! > > Rinse and repeat. > > Old snapshots trees are simply removed a-la 'rm -rf" when they're no > longer wanted. > >> So each snapshot is using the space required by the directory >> structure, plus the space required by any changed files. > > Sort of. The backup filesystem has to contain one copy of every file > so that there's something to hard-link to. The backup is completely > stand-alone, so it doesn't make sense to talk about all of the > snapshots containing only deltas. When you get to the "oldest" > snapshot, there's nothing to delta "from". I get that - it's a different hard drive. > >> [...] >> >> And that is why I like "ext over lvm copying with rsync" as my >> strategy (not that I actually do it). You have lvm on your backup >> disk. When you do a backup you do "rsync with overwrite in place", >> which means rsync only writes blocks which have changed. You then >> take an lvm snapshot which uses almost no space whatsoever. >> >> So to compare "lvm plus overwrite in place" to "rsnapshot", my >> strategy uses the space for an lvm header and a copy of all blocks >> that have changed. >> >> Your strategy takes a copy of the entire directory structure, plus a >> complete copy of every file that has changed. That's a LOT more. > > I don't understand, are you saying that somehow your backup doesn't > contain a copy of every file? > YES! Let's make it clear though, we're talking about EVERY VERSION of every backed up file. And you need to get your head round the fact I'm not - actually - backing up my filesystem. I'm actually snapshoting my disk volume, my disk partition if you like. Your strategy contains a copy of every file in your original backup, a full copy of the file structure for every snapshot, and a full copy of every version of every file that's been changed. My version contains a complete copy of the current backup and (thanks to the magic of lvm) a block level diff of every snapshot, which appears to the system as a complete backup, despite taking up much less space than your typical incremental backup. To change analogies completely - think git. My lvm snapshot is like a git commit. Git only stores the current HEAD, and retrieves previous commits by applying diffs. If I "check out a backup" (ie mount a backup volume), lvm applies a diff to the live filesystem. Cheers, Wol