From: Wols Lists <antlists@youngman.org.uk>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Re: Suggestions for backup scheme?
Date: Sun, 4 Feb 2024 09:59:22 +0000 [thread overview]
Message-ID: <17a2d820-4745-405d-844a-09e27184e56a@youngman.org.uk> (raw)
In-Reply-To: <upnai5$uvt$1@ciao.gmane.io>
On 04/02/2024 06:24, Grant Edwards wrote:
> On 2024-02-03, Wol <antlists@youngman.org.uk> wrote:
>> On 03/02/2024 16:02, Grant Edwards wrote:
>>> rsnapshot is an application that uses rsync to do
>>> hourly/daily/weekly/monthly (user-configurable) backups of selected
>>> directory trees. It's done using rsync to create snapshots. They are
>>> in-effect "incremental" backups, because the snapshots themselves are
>>> effectively "copy-on-write" via clever use of hard-links by rsync. A
>>> year's worth of backups for me is 7 daily + 4 weekly + 12 monthly
>>> snapshots for a total of 23 snapshots. If nothing has changed during
>>> the year, those 23 snapshots take up the same amount of space as a
>>> single snapshot.
>>
>> So as I understand it, it looks like you first do a "cp with hardlinks"
>> creating a complete new directory structure, but all the files are
>> hardlinks so you're not using THAT MUCH space for your new image?
>
> No, the first snaphost is a complete copy of all files. The snapshots
> are on a different disk, in a different filesystem, and they're just
> plain directory trees that you can brose with normal filesystem
> tools. It's not possible to hard-link between the "live" filesystem
> and the backup snapshots. The hard-links are to inodes "shared"
> between different snapshot directory trees. The first snapshot copies
> everything to the backup drive (using rsync).
Yes I get that. You create a new partition and copy all your files into it.
I create a new pv (physical volume), lv (logical volume), and copy all
my files into it.
>
> The next snapshot creates a second directory tree with all unchanged
> files hard-linked to the files that were copied as part of the first
> snapshot. Any changed files just-plain-copied into the second snapshot
> directory tree.
You create a complete new directory structure, which uses at least one
block per directory. You can't hard link directories.
I create a LVM snapshot. Dunno how much that is - a couple of blocks?
You copy all the files that have changed, leaving the old copy in the
old tree and the new copy in the new tree - for a 10MB file that's
changed, you use 10MB.
I use rsync's "Overwrite in place" mode, so if I change 10 bytes at the
end of that 10MB file I use ONE block to overwrite it (unless sod
strikes). The old block is left in the old volume, the new block is left
in the new volume.
>
> The third snapshot does the same thing (starting with the second
> snapshot directory tree).
So you end up with multiple directory trees (which could be large in
themselves), and multiple copies of files that have changed. Which could
be huge files.
I end up with ONE copy of my current data, and a whole bunch of dated
mount points, each of which is a full copy as of that date, but only
actually uses enough space to store a diff of the volume - if I change
that 10MB file every backup, but only change lets say 10KB over three
4KB disk blocks, I've only used four blocks - 16KB - per backup!
>
> Rinse and repeat.
>
> Old snapshots trees are simply removed a-la 'rm -rf" when they're no
> longer wanted.
>
>> So each snapshot is using the space required by the directory
>> structure, plus the space required by any changed files.
>
> Sort of. The backup filesystem has to contain one copy of every file
> so that there's something to hard-link to. The backup is completely
> stand-alone, so it doesn't make sense to talk about all of the
> snapshots containing only deltas. When you get to the "oldest"
> snapshot, there's nothing to delta "from".
I get that - it's a different hard drive.
>
>> [...]
>>
>> And that is why I like "ext over lvm copying with rsync" as my
>> strategy (not that I actually do it). You have lvm on your backup
>> disk. When you do a backup you do "rsync with overwrite in place",
>> which means rsync only writes blocks which have changed. You then
>> take an lvm snapshot which uses almost no space whatsoever.
>>
>> So to compare "lvm plus overwrite in place" to "rsnapshot", my
>> strategy uses the space for an lvm header and a copy of all blocks
>> that have changed.
>>
>> Your strategy takes a copy of the entire directory structure, plus a
>> complete copy of every file that has changed. That's a LOT more.
>
> I don't understand, are you saying that somehow your backup doesn't
> contain a copy of every file?
>
YES! Let's make it clear though, we're talking about EVERY VERSION of
every backed up file.
And you need to get your head round the fact I'm not - actually -
backing up my filesystem. I'm actually snapshoting my disk volume, my
disk partition if you like.
Your strategy contains a copy of every file in your original backup, a
full copy of the file structure for every snapshot, and a full copy of
every version of every file that's been changed.
My version contains a complete copy of the current backup and (thanks to
the magic of lvm) a block level diff of every snapshot, which appears to
the system as a complete backup, despite taking up much less space than
your typical incremental backup.
To change analogies completely - think git. My lvm snapshot is like a
git commit. Git only stores the current HEAD, and retrieves previous
commits by applying diffs. If I "check out a backup" (ie mount a backup
volume), lvm applies a diff to the live filesystem.
Cheers,
Wol
next prev parent reply other threads:[~2024-02-04 9:59 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-30 18:15 [gentoo-user] Suggestions for backup scheme? Grant Edwards
2024-01-30 18:47 ` Thelma
2024-01-30 19:29 ` [gentoo-user] " Grant Edwards
2024-01-30 18:54 ` [gentoo-user] " Michael
2024-01-30 19:32 ` [gentoo-user] " Grant Edwards
2024-01-30 19:19 ` [gentoo-user] " Rich Freeman
2024-01-30 19:43 ` [gentoo-user] " Grant Edwards
2024-01-30 20:08 ` [gentoo-user] " Wol
2024-01-30 20:15 ` Rich Freeman
2024-01-30 20:38 ` [gentoo-user] " Grant Edwards
2024-01-31 8:14 ` gentoo-user
2024-01-31 11:45 ` John Covici
2024-01-31 13:01 ` Rich Freeman
2024-01-31 15:50 ` Grant Edwards
2024-01-31 17:40 ` Thelma
2024-01-31 17:56 ` Rich Freeman
2024-01-31 18:42 ` Wols Lists
2024-01-31 21:30 ` Rich Freeman
2024-02-01 10:16 ` Michael
2024-02-05 12:55 ` J. Roeleveld
2024-02-05 13:35 ` Rich Freeman
2024-02-06 13:12 ` J. Roeleveld
2024-02-06 20:27 ` Wols Lists
2024-02-07 11:11 ` J. Roeleveld
2024-02-07 21:59 ` Wols Lists
2024-02-08 6:32 ` J. Roeleveld
2024-02-08 17:36 ` Wols Lists
2024-02-09 12:53 ` J. Roeleveld
2024-02-06 15:38 ` Grant Edwards
2024-02-06 16:13 ` J. Roeleveld
2024-02-06 17:22 ` Grant Edwards
2024-02-07 11:21 ` J. Roeleveld
2024-01-31 18:00 ` Grant Edwards
2024-02-02 23:39 ` Grant Edwards
2024-02-02 23:58 ` Mark Knecht
2024-02-03 16:02 ` Grant Edwards
2024-02-03 17:05 ` Wol
2024-02-04 6:24 ` Grant Edwards
2024-02-04 9:59 ` Wols Lists [this message]
2024-02-04 15:48 ` Grant Edwards
2024-02-05 8:28 ` Wols Lists
2024-02-06 15:35 ` Grant Edwards
2024-02-06 16:19 ` J. Roeleveld
2024-02-06 17:29 ` Grant Edwards
2024-02-07 11:04 ` J. Roeleveld
2024-02-06 23:17 ` Wols Lists
2024-02-07 11:07 ` J. Roeleveld
2024-02-07 21:50 ` Wols Lists
2024-02-08 6:38 ` J. Roeleveld
2024-02-08 17:44 ` Wols Lists
2024-02-09 12:57 ` J. Roeleveld
2024-02-09 15:48 ` Wols Lists
2024-02-09 17:11 ` Peter Humphrey
2024-02-06 20:49 ` Wols Lists
2024-02-03 13:02 ` Michael
2024-02-03 16:15 ` Grant Edwards
2024-02-03 17:32 ` Rich Freeman
2024-02-03 18:10 ` Michael
2024-02-05 12:48 ` J. Roeleveld
2024-01-31 15:38 ` Grant Edwards
2024-02-04 10:54 ` [gentoo-user] " Paul Ezvan
2024-02-07 22:36 ` Frank Steinmetzger
2024-02-08 5:26 ` William Kenworthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17a2d820-4745-405d-844a-09e27184e56a@youngman.org.uk \
--to=antlists@youngman.org.uk \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox