From: Frank Steinmetzger <Warp_7@gmx.de>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] How to compress lots of tarballs
Date: Sat, 2 Oct 2021 00:31:44 +0200 [thread overview]
Message-ID: <YVeMUOWr1POJiOZG@kern> (raw)
In-Reply-To: <a3d946b7-7e3d-4183-d82f-43f57561938f@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4508 bytes --]
Am Wed, Sep 29, 2021 at 03:04:41PM -0500 schrieb Dale:
> Curious question here. As you may recall, I backup to a external hard
> drive. Would it make sense to use that software for a external hard
> drive?
Since you are using LVM for everything IIRC, it would be a very efficient
way for you to make incremental backups with snapshots. But I have no
knowledge in that area to give you hints.
But I do use Borg. It’s been my primary backup tool for my systems for
almost two years now. Before that I used rsnapshot (i.e. rsync with
versioning through hard links) for my home partion and simple rsync for the
data partition. Rsnapshot is quite slow, because it has to compare at least
the inodes of all files on the source and destination. Borg uses a cache,
which speeds things up drastically.
I have one Borg repo for the root fs, one for ~ and one for the data
partition, and each repo receives the partition from two different hosts,
but which have most of their data mirrored daily with Unison. A tool like
Borg can deduplicate all of that and create snapshots of it. This saves
oogles of space, but also allows me to restore an entire host with a simple
rsync from a mounted Borg repo. (only downside: no hardlink support, AFAIK).
Borg saves its data in 500 MB files, which makes it very SMR-friendly.
Rsnapshot will create little holes in the backup FS over time with the
deletion of old snapshots. And as we all know, this will bring SMR drives
down to a crawl. If you back-up only big video files, then this may not be a
huge problem. But it will be with the ~ partition, with its thousands of
little files. In Borg, little changes do not trickle down to many random
writes. If a data file becomes partially obsolete, it is rewritten into a
new file and the old one deleted as a whole. Thanks to that, I have no worry
using 2.5″ 4 TB drives as main backup drive (as we all know, everything 2.5″
above 1 TB is SMR).
Those big data files also make it very efficient to copy a Borg repo (for
example to mirror the backup drive to another drive for off-site storage),
because it uses a very small number of files itself:
$ borg info data
...
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
All archives: 18.09 TB 17.60 TB 1.23 TB
Unique chunks Total chunks
Chunk index: 722096 10888890
$ find data -type f | wc -l
2498
I have 21 snapshots in that repo, which amount to 18 TB of backed-up data,
deduped down to 1.23 TB, spread over only 2498 files including metadata.
> Right now, I'm just doing file updates with rsync and the drive
> is encrypted.
While Borg has an encryption feature, I chose not to use it and rely on the
underlying LUKS. Because then I can use KDE GUI stuff to mount the drive and
run my Borg wrapper script without ever having to enter a passphrase.
> Thing is, I'm going to have to split into three drives soon. So,
> compressing may help. Since it is video files, it may not help much but
> I'm not sure about that.
Of my PC’s data partition, almost 50 % is music, 20 % is my JPEG pictures
library, 15 % is video files and the rest is misc stuff like Kerbal Space
Program, compressed archives of OpenStreetMap files and VM images.
This is the statistics of my last snapshot:
Original size Compressed size Deduplicated size
730.80 GB 698.76 GB 1.95 MB
Compression gain is around 4 %. Much of which probably comes from empty
areas in VM images and 4 GB of pdf and html files. On my laptop, whose data
partition has fewer VM stuff, but a lot more videos, it looks thus:
Original size Compressed size Deduplicated size
1.01 TB 1.00 TB 1.67 MB
So only around 1 % of savings. However, compression is done using lz4 (by
default, you can choose other algos), which is extremely fast but not very
strong. In fact, Borg tries to compress all chunks, but if it finds that
compressing a chunk doesn’t yield enough benefit, it actually discards it
and uses the uncompressed data to save on CPU load later on.
--
Grüße | Greetings | Qapla’
Please do not share anything from, with or about me on any social network.
Some people are so tired, they can’t even stay awake until falling asleep.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2021-10-01 22:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-26 10:57 [gentoo-user] How to compress lots of tarballs Peter Humphrey
2021-09-26 11:36 ` Simon Thelen
[not found] ` <20210926113622.E8352E09BE@pigeon.gentoo.org>
2021-09-26 12:23 ` Ramon Fischer
2021-09-26 12:25 ` Ramon Fischer
2021-09-26 15:38 ` Peter Humphrey
2021-09-26 17:36 ` antlists
2021-09-28 17:45 ` Laurence Perkins
2021-09-27 1:39 ` Adam Carter
2021-09-27 13:30 ` Peter Humphrey
2021-09-27 14:13 ` Peter Humphrey
2021-09-28 17:43 ` Laurence Perkins
2021-09-29 8:27 ` Peter Humphrey
2021-09-29 15:37 ` Rich Freeman
2021-09-29 20:04 ` Dale
2021-09-29 20:27 ` Laurence Perkins
2021-09-29 20:58 ` Dale
2021-09-29 21:48 ` Wols Lists
2021-09-29 23:17 ` Rich Freeman
2021-09-30 17:19 ` antlists
2021-10-01 22:31 ` Frank Steinmetzger [this message]
2021-09-28 11:38 ` Rich Freeman
2021-09-28 13:02 ` Peter Humphrey
2021-09-28 11:19 ` Peter Humphrey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YVeMUOWr1POJiOZG@kern \
--to=warp_7@gmx.de \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox