public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Wols Lists <antlists@youngman.org.uk>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Re: Suggestions for backup scheme?
Date: Sun, 4 Feb 2024 09:59:22 +0000	[thread overview]
Message-ID: <17a2d820-4745-405d-844a-09e27184e56a@youngman.org.uk> (raw)
In-Reply-To: <upnai5$uvt$1@ciao.gmane.io>

On 04/02/2024 06:24, Grant Edwards wrote:
> On 2024-02-03, Wol <antlists@youngman.org.uk> wrote:
>> On 03/02/2024 16:02, Grant Edwards wrote:
>>> rsnapshot is an application that uses rsync to do
>>> hourly/daily/weekly/monthly (user-configurable) backups of selected
>>> directory trees. It's done using rsync to create snapshots. They are
>>> in-effect "incremental" backups, because the snapshots themselves are
>>> effectively "copy-on-write" via clever use of hard-links by rsync. A
>>> year's worth of backups for me is 7 daily + 4 weekly + 12 monthly
>>> snapshots for a total of 23 snapshots.  If nothing has changed during
>>> the year, those 23 snapshots take up the same amount of space as a
>>> single snapshot.
>>
>> So as I understand it, it looks like you first do a "cp with hardlinks"
>> creating a complete new directory structure, but all the files are
>> hardlinks so you're not using THAT MUCH space for your new image?
> 
> No, the first snaphost is a complete copy of all files.  The snapshots
> are on a different disk, in a different filesystem, and they're just
> plain directory trees that you can brose with normal filesystem
> tools. It's not possible to hard-link between the "live" filesystem
> and the backup snapshots. The hard-links are to inodes "shared"
> between different snapshot directory trees. The first snapshot copies
> everything to the backup drive (using rsync).

Yes I get that. You create a new partition and copy all your files into it.

I create a new pv (physical volume), lv (logical volume), and copy all 
my files into it.
> 
> The next snapshot creates a second directory tree with all unchanged
> files hard-linked to the files that were copied as part of the first
> snapshot. Any changed files just-plain-copied into the second snapshot
> directory tree.

You create a complete new directory structure, which uses at least one 
block per directory. You can't hard link directories.

I create a LVM snapshot. Dunno how much that is - a couple of blocks?

You copy all the files that have changed, leaving the old copy in the 
old tree and the new copy in the new tree - for a 10MB file that's 
changed, you use 10MB.

I use rsync's "Overwrite in place" mode, so if I change 10 bytes at the 
end of that 10MB file I use ONE block to overwrite it (unless sod 
strikes). The old block is left in the old volume, the new block is left 
in the new volume.
> 
> The third snapshot does the same thing (starting with the second
> snapshot directory tree).

So you end up with multiple directory trees (which could be large in 
themselves), and multiple copies of files that have changed. Which could 
be huge files.

I end up with ONE copy of my current data, and a whole bunch of dated 
mount points, each of which is a full copy as of that date, but only 
actually uses enough space to store a diff of the volume - if I change 
that 10MB file every backup, but only change lets say 10KB over three 
4KB disk blocks, I've only used four blocks - 16KB - per backup!
> 
> Rinse and repeat.
> 
> Old snapshots trees are simply removed a-la 'rm -rf" when they're no
> longer wanted.
> 
>> So each snapshot is using the space required by the directory
>> structure, plus the space required by any changed files.
> 
> Sort of. The backup filesystem has to contain one copy of every file
> so that there's something to hard-link to. The backup is completely
> stand-alone, so it doesn't make sense to talk about all of the
> snapshots containing only deltas. When you get to the "oldest"
> snapshot, there's nothing to delta "from".

I get that - it's a different hard drive.
> 
>> [...]
>>
>> And that is why I like "ext over lvm copying with rsync" as my
>> strategy (not that I actually do it). You have lvm on your backup
>> disk. When you do a backup you do "rsync with overwrite in place",
>> which means rsync only writes blocks which have changed. You then
>> take an lvm snapshot which uses almost no space whatsoever.
>>
>> So to compare "lvm plus overwrite in place" to "rsnapshot", my
>> strategy uses the space for an lvm header and a copy of all blocks
>> that have changed.
>>
>> Your strategy takes a copy of the entire directory structure, plus a
>> complete copy of every file that has changed. That's a LOT more.
> 
> I don't understand, are you saying that somehow your backup doesn't
> contain a copy of every file?
> 
YES! Let's make it clear though, we're talking about EVERY VERSION of 
every backed up file.

And you need to get your head round the fact I'm not - actually - 
backing up my filesystem. I'm actually snapshoting my disk volume, my 
disk partition if you like.

Your strategy contains a copy of every file in your original backup, a 
full copy of the file structure for every snapshot, and a full copy of 
every version of every file that's been changed.

My version contains a complete copy of the current backup and (thanks to 
the magic of lvm) a block level diff of every snapshot, which appears to 
the system as a complete backup, despite taking up much less space than 
your typical incremental backup.

To change analogies completely - think git. My lvm snapshot is like a 
git commit. Git only stores the current HEAD, and retrieves previous 
commits by applying diffs. If I "check out a backup" (ie mount a backup 
volume), lvm applies a diff to the live filesystem.

Cheers,
Wol



  reply	other threads:[~2024-02-04  9:59 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30 18:15 [gentoo-user] Suggestions for backup scheme? Grant Edwards
2024-01-30 18:47 ` Thelma
2024-01-30 19:29   ` [gentoo-user] " Grant Edwards
2024-01-30 18:54 ` [gentoo-user] " Michael
2024-01-30 19:32   ` [gentoo-user] " Grant Edwards
2024-01-30 19:19 ` [gentoo-user] " Rich Freeman
2024-01-30 19:43   ` [gentoo-user] " Grant Edwards
2024-01-30 20:08   ` [gentoo-user] " Wol
2024-01-30 20:15     ` Rich Freeman
2024-01-30 20:38       ` [gentoo-user] " Grant Edwards
2024-01-31  8:14         ` gentoo-user
2024-01-31 11:45           ` John Covici
2024-01-31 13:01             ` Rich Freeman
2024-01-31 15:50               ` Grant Edwards
2024-01-31 17:40                 ` Thelma
2024-01-31 17:56                   ` Rich Freeman
2024-01-31 18:42                     ` Wols Lists
2024-01-31 21:30                       ` Rich Freeman
2024-02-01 10:16                         ` Michael
2024-02-05 12:55                     ` J. Roeleveld
2024-02-05 13:35                       ` Rich Freeman
2024-02-06 13:12                         ` J. Roeleveld
2024-02-06 20:27                           ` Wols Lists
2024-02-07 11:11                             ` J. Roeleveld
2024-02-07 21:59                               ` Wols Lists
2024-02-08  6:32                                 ` J. Roeleveld
2024-02-08 17:36                                   ` Wols Lists
2024-02-09 12:53                                     ` J. Roeleveld
2024-02-06 15:38                       ` Grant Edwards
2024-02-06 16:13                         ` J. Roeleveld
2024-02-06 17:22                           ` Grant Edwards
2024-02-07 11:21                             ` J. Roeleveld
2024-01-31 18:00                   ` Grant Edwards
2024-02-02 23:39               ` Grant Edwards
2024-02-02 23:58                 ` Mark Knecht
2024-02-03 16:02                   ` Grant Edwards
2024-02-03 17:05                     ` Wol
2024-02-04  6:24                       ` Grant Edwards
2024-02-04  9:59                         ` Wols Lists [this message]
2024-02-04 15:48                           ` Grant Edwards
2024-02-05  8:28                             ` Wols Lists
2024-02-06 15:35                               ` Grant Edwards
2024-02-06 16:19                                 ` J. Roeleveld
2024-02-06 17:29                                   ` Grant Edwards
2024-02-07 11:04                                     ` J. Roeleveld
2024-02-06 23:17                                   ` Wols Lists
2024-02-07 11:07                                     ` J. Roeleveld
2024-02-07 21:50                                       ` Wols Lists
2024-02-08  6:38                                         ` J. Roeleveld
2024-02-08 17:44                                           ` Wols Lists
2024-02-09 12:57                                             ` J. Roeleveld
2024-02-09 15:48                                               ` Wols Lists
2024-02-09 17:11                                                 ` Peter Humphrey
2024-02-06 20:49                                 ` Wols Lists
2024-02-03 13:02                 ` Michael
2024-02-03 16:15                   ` Grant Edwards
2024-02-03 17:32                 ` Rich Freeman
2024-02-03 18:10                   ` Michael
2024-02-05 12:48               ` J. Roeleveld
2024-01-31 15:38           ` Grant Edwards
2024-02-04 10:54 ` [gentoo-user] " Paul Ezvan
2024-02-07 22:36 ` Frank Steinmetzger
2024-02-08  5:26   ` William Kenworthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17a2d820-4745-405d-844a-09e27184e56a@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox