public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Eddie Chapman <eddie@ehuk.net>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] Current unavoidable use of xz utils in Gentoo
Date: Sat, 6 Apr 2024 12:57:23 +0100	[thread overview]
Message-ID: <92ef54a0-7a49-49f3-b3cc-d38a2b9adebd@ehuk.net> (raw)
In-Reply-To: <a6ee4efea30d0748b560cf772b659d8b.squirrel@ukinbox.ecrypt.net>

[-- Attachment #1: Type: text/plain, Size: 18716 bytes --]

On 04/04/2024 15:24, Eddie Chapman wrote:
> Since there appears to be some interest I'll put together a single email
> to the list later today detailing everything, as I needed to do more
> things overall in addition to replacing /usr/bin/xz.

Below is a guide I've written to removing app-arch/xz-utils in case 
anyone else wants to do so.  Attached is the current version of the Bash 
wrapper script I now use in place of /usr/bin/xz

Comments, corrections on anything technical in the guide or script are 
welcome, apart from flames about how this is ridiculous and unnecessary :-).

Best wishes,
Eddie


==== Guide to removing xz utils on a Gentoo system ====

=== Introduction ===

This guide is for people who wish to remove xz utils (app-arch/xz-utils) 
from a Gentoo system.

I've been able to remove xz utils from two Gentoo workstations with 2412 
packages and KDE 5.x as the desktop, and it has not been painful at all. 
I've gone on to remove it from several Gentoo server systems without any 
pain. These are all SElinux systems.

In this guide we replace app-arch/xz-utils with app-arch/p7zip which 
will do all the work of uncompressing xz distfiles for Portage going 
forward. It works perfectly fine for that right now.

I've written a bash wrapper script which is designed to be installed as 
/usr/bin/xz, which is referred to in the instructions below. It is 
attached to this email as xz.txt. It tries to takes care of 
decompressing .xz files transparently whenever Portage runs /usr/bin/xz, 
by behaving like it but using app-arch/p7zip in the background. You will 
need it if you want to get rid of app-arch/xz-utils. But don't blindly 
use it, check it yourself first of course. If you don't like it you will 
either need to write your own script, or hack emerge/Portage in various 
places to use something else to decompress xz files.

You're mileage may vary with any of this, proceed at your own risk, 
don't blame me if you break your system or lose data.


=== Warnings / Caveats / Breakages ===

Before you do this, you should identify whether you have applications or 
scripts which use the Tukaani xz utils, or that link against 
liblzma.so.5. This could include non-Gentoo apps or scripts you run 
which call any of the xz utils (xz, unxz, xzgrep|xzegrep|xzfgrep, xzcat, 
xzcmp, xzdec, xzdiff, lzma, unlzma, lzgrep|lzegrep|lzfgrep, lzmainfo, 
lzmadec, lzcmp, lzdiff, lzcat). Those programs will all be gone, so you 
should not do this if you want or need them and cannot use alternatives.

99% of packages in Gentoo work fine without xz utils, it's just that 
some might optionally link against liblzma.so.5 in order to provide 
support for xz (de)compression along with other algorithms. We will 
rebuild those packages so they don't link against liblzma.so.5 anymore.

xx utils is a relative newcomer to the Linux/OSS/GNU world so you will 
find there aren't any low level system packages that absolutely need it 
to do their main job. You are highly unlikely to render your system 
completely unbootable doing this.

But removing it does carry some risk. You might discover along the way 
there is some application you have installed that cannot function 
without xz utils. You might just have to uninstall it and find an 
alternative, if the situation cannot be resolved by creating your own 
custom ebuild and tweaking configure/meson options. But worst case if 
you have to uninstall a package and other packages depend on it, you 
might have to remove them too, and I'm sure you know how that remove 
list can potentially turn into a long one once all deps are worked out.

You will lose some things. I've had to uninstall the following two 
packages for now:

media-gfx/gimp
kde-apps/ark (and kde-apps/kdeutils-meta which depends on it)

(I'll probably figure out later how to coax them into working without 
xz. There might even be upstream updates soon that make xz optional, who 
knows. I'll also need to add to my world file at some point everything 
that was in kde-apps/kdeutils-meta.)

If you run another desktop (e.g. Gnome) I've no idea what might or might 
not need xz utils. The situation with your desktop environment may be 
worse, more painful, or impossible.

You will lose lzma support in the core Python language (dev-lang/python) 
in 3.x versions and higher (not sure when exactly support was introduced 
but 2.7 does not have it, 3.11 & 3.12 do), so if you have python scripts 
that happen to need that, well, they will definitely throw a big error 
after this :-) But I was able to rebuild the 179 dev-python packages on 
my workstations and everything in app-portage and none of them 
complained. I've been able to go on and do plenty of rebuilding with 
Portage after this without any problem, so core Python functionality in 
Gentoo is fine (although see next paragraph about Gemato).

There is one significant thing that breaks, which is Gemato 
(app-portage/gemato). Gemato requires lzma support in core python in 
order to do GPG signature verification. This means you will have to say 
goodbye (for now) to verifying upstream GPG signatures on distfiles, and 
verification of Portage metadata after doing an emerge --sync. These 
features have been added to Portage relatively recently (2022?) so are 
"nice to have", without them your system is just less hardened, but 
still with the very high level of security that Gentoo systems have has 
always had prior to these features, in my opinion. Personally I can live 
without them for now. Verifying hashes in Manifest files still works 
fine and that's the main thing. You may disagree in which case, well, 
don't do this then. I'm going to figure out an alternative way I can 
verify Portage metadata soon, as there are other ways if you are creative.

In practise this means you have to use USE="-verify-sig" for every 
emerge with a package that has a corresponding sec-keys package, and you 
have to set:

sync-rsync-verify-metamanifest = no

in files in /etc/portage/repos.conf/

But after doing that all works fine.

Here's some other very minor things you might lose if you are currently 
using them:

- KDE users will lose xz compression support from KArchive 
(kde-frameworks/karchive). AFAICT this has NOT had any impact on my own 
KDE experience, I've not seen any errors and everything I use works fine 
in my KDE sessions. KArchive will still support GZip, BZip2 and Zstd, 
just not xz. I suspect nothing that uses KArchive is using xz by 
default, but I'm not completely sure. All I know is my KDE sessions are 
running fine without it, and I can do everything in KDE I did before 
(apart from use Ark of course, see above). I don't know anything about 
KArchive. Full details of compression support in KArchive are at 
https://api.kde.org/frameworks/karchive/html/classKCompressionDevice.html

- Portage binary packages: You cannot use xz compression if you create 
Portage binary packages. You will need to use one of bzip2, gzip, lz4, 
lzip, lzop, or zstd in BINPKG_COMPRESS in make.conf instead of xz (if 
that is what you were using, or is it the default?). I have always used 
gzip so no probs for me, creating binary packages works fine, I've 
already updated several Gentoo systems from many binary packages I've 
created using gzip without xz utils installed.

- Grub bootloader: If you happened to have been using the optional, not 
used by default, --compress argument for grub-install, and you happen to 
have chosen xz, well you can't anymore. You will have to use gz or lzo 
instead, or stop using --compress if you don't like either of those two. 
Grub still builds, installs, works fine without xz utils for almost 
everyone. But if you did happen to previously use --compress=xz with 
grub-install before, make sure you check out fully what you might or 
might not have to do before next rebooting (I have no idea, I have never 
used this feature, Grub has continued working fine for me after 
rebuilding it without xz-utils and running grub-install again on my boot 
drives).

- Dovecot: net-mail/dovecot links liblzma.so.5 in order to support it's 
optional Zlib plugin ( 
https://doc.dovecot.org/configuration_manual/zlib_plugin/ ) for 
reading/writing compressed mail files. Despite the plugin being called 
"Zlib" it supports several different compression algorithms. At one time 
they supported xz, but in recent Dovecot releases they decided to 
deprecate it. They still support reading (not writing) xz compressed 
files, so when net-mail/dovecot is built, if it finds liblzma.so.5 it 
will use it, if it doesn't find it, it wont, and then you just have no 
support for xz in the Zlib plugin (again, only *if* you are using that 
plugin, which is not default). From what I can gather, if you use this 
plugin you should migrate away from using xz compressed mail files (to 
another supported compression). So you should do that before you do 
this, if that applies to you. I use Dovecot but never enabled mail file 
compression so this did not affect me, Dovecot has continued working 
fine with the mail stores I look after.

- Mariadb: If you happen to make use of the optional InnoDB Page 
Compression feature in Mariadb ( 
https://mariadb.com/kb/en/innodb-page-compression/ ), and if you happen 
to have chosen lzma compression for that feature (not the default) 
rather than one of the other 5 algorithms, then that is very unlucky, 
you will need to change that in your MariaDB installation in order to 
use one of the other 5 compression algorithms instead. dev-db/mariadb 
during build will automatically pick up support for the compression 
algorithms you have installed on the system, you don't currently specify 
anything in the ebuild that affects that. So if you have dev-db/mariadb 
installed you will have to rebuild it after removing xz utils as it 
links against liblzma.so.5 for this feature, and on rebuilding it you 
will lose support for lzma in InnoDB Page Compression. If you don't know 
if you are using it, this sql query will tell you:

SHOW GLOBAL VARIABLES LIKE 'innodb_compression_algorithm';

On my MariaDB 10.6 server it returned:

+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| innodb_compression_algorithm | zlib  |
+------------------------------+-------+

So I was not using lzma and was not affected. Tested: my MariaDB 10.6 
server is now using rebuilt dev-db/mariadb without liblzma.so.5 and is 
running with no problems.

- sys-apps/fwupd might stop working properly (though it will still build 
fine) due to what you have to change with dev-libs/libxmlb below. I'm 
not sure as I haven't checked yet, I just suspect it will. So bear that 
in mind if you need to rely on sys-apps/fwupd at the moment. But this 
"might" is temporary, upstream has now decided to make lzma optional, so 
this will trickle down to Gentoo soon.

- app-arch/rpm will probably not be able to extract some rpm archives if 
they are compressed with xz, but I haven't checked that yet. Though it 
will still build fine. This does not affect building Gentoo packages 
which come with .rpm distfiles (e.g. libreoffice), Portage uses rpm2tgz 
for that and my script takes care of the rest.


=== The instructions ===

Follow them in order.

1.  Do an emerge --sync and @world update first to make sure any
     upgrades/updates have been applied. Makes it easier for the
     things you need to do after you remove xz utils.

2.  Install p7zip: emerge app-arch/p7zip

3.  Add -lzma to USE flags in make.conf

4.  Rebuild @world. This will rebuild only a few packages which
     respect -lzma

5.  Copy the bash wrapper script to somewhere on the machine you
     are doing this on (but NOT to /usr/bin yet)

6.  Prepare the script to be installed. Rename it to "xz" (with
     no extension), set permissions to 0755, owned by root:root.

7.  On an SElinux installation, set the SElinux context of the
     script to whatever the current /usr/bin/xz binary is set to.

8.  Remove xz utils, ignoring the warning about it being part
     of system: emerge --unmerge app-arch/xz-utils
     Once it is removed Portage will tell you that it preserved
     liblzma.so.5. More on that below.

9.  Install the bash wrapper script to /usr/bin/xz

10. Add the following line:
     app-arch/xz-utils-5.4.2
     to /etc/portage/profile/package.provided

11. Remove kde-apps/kdeutils-meta and kde-apps/ark if you use
     KDE, and media-gfx/gimp if you use it:
     emerge --unmerge kde-apps/kdeutils-meta kde-apps/ark media-gfx/gimp

12. (optional) Add -verify-sig to USE flags in make.conf. If you
     do you will soon have to rebuild all packages that rely on it.
     If you don't, you can just add USE="-verify-sig" in front of
     every emerge command you have to do from now on, or add to
     individual packages in your package.use file.

13. Now you will need to rebuild all packages with files that rely
     on the preserved liblzma.so.5 library. See below for further
     notes about that.

14. set:
     sync-rsync-verify-metamanifest = no
     in applicable files in /etc/portage/repos.conf/ before you do
     your next emerge --sync

15. Eventually, you will have to rebuild all packages that have
     corresponding signatures in sec-keys.

That's all, enjoy life without app-arch/xz-utils! But read on for more 
info about step 13.


=== Notes about Step 13 ===

These are the packages that I needed to rebuild on my systems before the 
preserved liblzma.so.5 library was finally removed by Portage:

app-arch/libarchive
app-arch/rpm
sys-boot/grub
dev-db/mariadb
dev-lang/python:2.7
kde-frameworks/karchive
dev-lang/python:3.11 (needs custom ebuild, see below)
dev-lang/python:3.12 (needs custom ebuild, see below)
net-mail/dovecot (needs custom ebuild, see below)
dev-libs/libxmlb (needs custom ebuild, see last note at the bottom of 
this guide)

There might be others on your system. In most cases just rebuilding them 
will be enough. Some you might be able to clone the ebuild to your local 
repo and tweak configure/meson options so that the package does not link 
against liblzma.so.5. There may be packages with issues too difficult to 
resolve so you might have to just uninstall them if you can't live 
without them :-(  (or resign yourself to rolling back and having to live 
with xz utils)

Remember you will need to specify USE="-verify-sig" for any packages 
that rely on that, in whichever is your preferred way.

 From my list I had to clone the following 3 packages to my local 
ebuilds directory with small modification to each in order to get them 
to build without linking against liblzma.so.5:
net-mail/dovecot
dev-lang/python:3.11
dev-lang/python:3.12

Here are 3 diffs showing what I had to change:

--- /usr/portage/net-mail/dovecot/dovecot-2.3.21-r1.ebuild
+++ /usr/local/portage/net-mail/dovecot/dovecot-2.3.21-r1.ebuild
@@ -43,7 +43,6 @@

  DEPEND="
         app-arch/bzip2
-       app-arch/xz-utils
         dev-libs/icu:=
         dev-libs/openssl:0=
         sys-libs/zlib:=
@@ -126,7 +125,7 @@
                 --disable-rpath \
                 --with-bzlib \
                 --without-libbsd \
-               --with-lzma \
+               --without-lzma \
                 --with-icu \
                 --with-ssl \
                 --with-zlib \

--- /usr/portage/dev-lang/python/python-3.11.8_p1.ebuild
+++ /usr/local/portage/dev-lang/python/python-3.11.8_p1.ebuild
@@ -179,6 +179,7 @@
         # Avoid as many dependencies as possible for the cross build.
         cat >> Makefile <<-EOF || die
                 MODULE_NIS_STATE=disabled
+               MODULE__LZMA_STATE=disabled
                 MODULE__DBM_STATE=disabled
                 MODULE__GDBM_STATE=disabled
                 MODULE__DBM_STATE=disabled
@@ -328,7 +329,7 @@
         fi

         # force-disable modules we don't want built
-       local disable_modules=( NIS )
+       local disable_modules=( NIS _LZMA )
         use gdbm || disable_modules+=( _GDBM _DBM )
         use sqlite || disable_modules+=( _SQLITE3 )
         use ssl || disable_modules+=( _HASHLIB _SSL )


--- /usr/portage/dev-lang/python/python-3.12.2_p1.ebuild
+++ /usr/local/portage/dev-lang/python/python-3.12.2_p1.ebuild
@@ -177,6 +177,7 @@
         cat > Modules/Setup.local <<-EOF || die
                 *disabled*
                 nis
+               _lzma
                 _dbm _gdbm
                 _sqlite3
                 _hashlib _ssl
@@ -299,6 +300,7 @@
         cat > Modules/Setup.local <<-EOF || die
                 *disabled*
                 nis
+               _lzma
                 $(usev !gdbm '_gdbm _dbm')
                 $(usev !sqlite '_sqlite3')
                 $(usev !ssl '_hashlib _ssl')


Lastly, I needed to create a custom dev-libs/libxmlb ebuild in order to 
upgrade it from 0.3.14 (latest in Gentoo at time of writing) to 0.3.15.

I also needed to apply a very recent patch from upstream, from this 
commit, which makes LZMA support optional:
https://github.com/hughsie/libxmlb/commit/bdf845510fbed40b88465b2272ccad9e93656639

and I needed to make some small changes to the ebuild.

So this is what you need to do at the time of writing (6th April 2024):

1. Copy the in-tree /usr/portage/dev-libs/libxmlb ebuild directory into 
your local ebuilds directory.

2. Rename the ebuild file from libxmlb-0.3.14.ebuild to 
libxmlb-0.3.15.ebuild

3. Download the raw patch, you can use this link:
 
https://github.com/hughsie/libxmlb/commit/bdf845510fbed40b88465b2272ccad9e93656639.patch
    rename it to:
    libxmlb-0.3.15-make_lzma_optional.patch
    and place it in the local "files" directory.

4. Modify the new ebuild according to the diff below. Then just rebuild it.

--- /usr/portage/dev-libs/libxmlb/libxmlb-0.3.14.ebuild
+++ /usr/local/portage/dev-libs/libxmlb/libxmlb-0.3.15.ebuild
@@ -14,15 +14,15 @@
  SLOT="0/2" # libxmlb.so version

  KEYWORDS="amd64 ~arm arm64 ~loong ppc ppc64 ~riscv x86"
-IUSE="doc introspection stemmer test +zstd"
+IUSE="doc introspection -lzma stemmer test +zstd"

  RESTRICT="!test? ( test )"

  RDEPEND="
-       app-arch/xz-utils
         dev-libs/glib:2
         sys-apps/util-linux
         stemmer? ( dev-libs/snowball-stemmer:= )
+       lzma? ( app-arch/xz-utils:= )
         zstd? ( app-arch/zstd:= )
  "

@@ -43,6 +43,7 @@

  PATCHES=(
         "${FILESDIR}"/${PN}-0.3.12-no_installed_tests.patch
+       "${FILESDIR}"/${PN}-0.3.15-make_lzma_optional.patch
  )

  python_check_deps() {
@@ -60,6 +61,7 @@
                 $(meson_use stemmer)
                 $(meson_use test tests)
                 $(meson_use zstd)
+               $(meson_feature lzma)
         )
         meson_src_configure
  }

[-- Attachment #2: xz.txt --]
[-- Type: text/plain, Size: 10962 bytes --]

#!/usr/bin/env bash

# SPDX-License-Identifier: GPL-2.0-only
# SPDX-FileCopyrightText: 2024 Eddie Chapman <eddie@ehuk.net>

# WARNING: this script is currently not a full replacement for xz, it just mimicks 
# some of the decompression functionality of xz. It is only designed at the
# moment to be called by Portage and even then it does not yet cover all cases of 
# that.

# Some places in portage where xz is called:
# - /usr/lib/portage/python3.11/phase-helpers.sh
#      This is where 99% of calls to xz happen, from the line:
#      __unpack_tar "xz -T$(___makeopts_jobs) -d"
#      in the unpack phase.
#      This results in a call to xz inside __unpack_tar() where the -c arg is added (for stdout)
#      and the filename is added as an argument.
# - /usr/bin/deb2targz
#      Some packages e.g. google-chrome have deb distfiles which can contain a data.tar.xz
#      file so deb2targz launches xz -dc to decompress that.
# - /usr/bin/rpm2tar
#      Some packages e.g. libreoffice have rpm distfiles compressed with xz so rpm2tar launches 
#      xz -dc to decompress them
# - /usr/portage/eclass/llvm.org.eclass
#      xz is not called directly but tar -x -J is run (and tar then runs "xz -d" with piped 
#      in/out, with no file as argument)

LOGGER=$(command -v logger)

if [ ! -x "${LOGGER}" ]; then
echo "(wrapper): Fatal error: logger command does not appear to exist!"
exit 1
fi

LOG_PREFIX="(wrapper):"

# /usr/bin/7za is just a wrapper that executes this
SEVEN_ZA="/usr/lib64/p7zip/7za"

DATE_CMD=$(command -v date)
MKTEMP_CMD=$(command -v mktemp)
PS_CMD=$(command -v ps)
CAT_CMD=$(command -v cat)
CHMOD=$(command -v chmod)
WHOAMI=$(command -v whoami)
GREP=$(command -v grep)
FILE_CMD=$(command -v file)
READLINK=$(command -v readlink)

for EXE_F in ${SEVEN_ZA} ${DATE_CMD} ${MKTEMP_CMD} ${PS_CMD} ${CAT_CMD} ${CHMOD} \
${WHOAMI} ${GREP} ${FILE_CMD} ${READLINK}; do

	if [ ! -x "${EXE_F}" ]; then
	MSG="${LOG_PREFIX} Fatal Error: ${EXE_F} does not exist or is not an exe!"
	${LOGGER} -p syslog.err -t "${0}" "${MSG}"
	exit 1
	fi

done

DECOMPRESS_REQUESTED=N
STDOUT_REQUESTED=N

for myarg in "${@}"; do

	# Look for the 3 forms of xz's decompress argument when it is by itself.
	# TO-DO: collapse these into one grep command, improve the horrible regex.
	echo "${myarg}" | ${GREP} -Eq '^[-]d$' 
	retA=$?
	echo "${myarg}" | ${GREP} -Eq '^[-][-]decompress$'
	retB=$?
	echo "${myarg}" | ${GREP} -Eq '^[-][-]uncompress$'
	retC=$?

	if [ ${retA} -eq 0 ] || [ ${retB} -eq 0 ] || [ ${retC} -eq 0 ]; then
	DECOMPRESS_REQUESTED=Y
	fi

	# Look for the 3 forms of xz's stdout argument when it is by itself.
	# TO-DO: collapse these into one grep command, improve the horrible regex.
	echo "${myarg}" | ${GREP} -Eq '^[-]c$' 
	retA=$?
	echo "${myarg}" | ${GREP} -Eq '^[-][-]to[-]stdout$'
	retB=$?
	echo "${myarg}" | ${GREP} -Eq '^[-][-]stdout$'
	retC=$?

	if [ ${retA} -eq 0 ] || [ ${retB} -eq 0 ] || [ ${retC} -eq 0 ]; then
	STDOUT_REQUESTED=Y
	fi

	# and look for both together as -dc or -cd
	# TO-DO: collapse these into one grep command, improve the horrible regex.
	echo "${myarg}" | ${GREP} -Eq '^[-]dc$' 
	retA=$?
	echo "${myarg}" | ${GREP} -Eq '^[-]cd$' 
	retB=$?

	if [ ${retA} -eq 0 ] || [ ${retB} -eq 0 ]; then
	DECOMPRESS_REQUESTED=Y
	STDOUT_REQUESTED=Y
	fi

done

# This script only tries to decompress. No compress functionaility at all
# at this stage in its development.
if [ "${DECOMPRESS_REQUESTED}" = "N" ]; then

	MSG="${LOG_PREFIX} Fatal Error: no (d|decompress|uncompress) option on the command line. Sorry, this wrapper script only supports decompression."
	#echo "$MSG"
	${LOGGER} -p syslog.err -t "${0}" "${MSG}"
	exit 1

fi

# DEBUG
#MSG="${LOG_PREFIX} (DEBUG) stdout requested? ${STDOUT_REQUESTED}"
#${LOGGER} -p syslog.info -t "${0}" "${MSG}"

WHO_CALLED=$(${WHOAMI})

# get the parent command, very useful for debugging
PARENT_CMD=$(${PS_CMD} -o args= ${PPID})

# DEBUG, avoid leaving enabled long term as potential for future security problem, 
# due to unescaped attacker controlled info being passed to logger
#MSG="${LOG_PREFIX} (DEBUG) U: ${WHO_CALLED}, PARENT: ${PARENT_CMD}, ARGS: ${@}"
#${LOGGER} -p syslog.info -t "${0}" "${MSG}"

f_passed_to_script=

# loop over args again to see if any file has been passed as an arg
# TO-DO there will be a better way of doing this.
for myarg in "${@}"; do

	# TO-DO, are there other possible extensions for xz. Also theoretically possible we
	# could be passed one with extension in caps.
	echo "${myarg}" | ${GREP} -Eq '[.]xz$' 
	r=$?

	if [ ${r} -eq 0 ]; then
	f_passed_to_script="${myarg}"
	break
	fi

done


function do_uncompress {

	# remember return numbers can only be btw 0 - 255

	# some sanity checks follow ...

	if [ -z "${1}" ]; then
	MSG_TO_SHOW="function requires 1 argument; a filename with our without leading path"
	return 184
	fi

	realf=$(${READLINK} -e "${1}" 2>/dev/null)

	if [ ! -f "${realf}" ]; then
	MSG_TO_SHOW="argument supplied either is not a file or, if it is a file, I cannot find it."
	return 194
	fi

	if ! test -s "${realf}"; then
	MSG_TO_SHOW="file supplied is empty!"
	return 204
	fi

	if [ ! -r "${realf}" ]; then
	MSG_TO_SHOW="file supplied cannot be read!"
	return 214
	fi

	# 7z does not like the file path being passed to it inside quotes. So lets just make 
	# sure not to pass it anything that contains any characters NOT in our sane list 
	# in our regex below (so nothing needs quoting as no shell metachars).
	# TO-DO: there will be a better way of dealing with this, prob by converting to an 
	# escaped string. But for now (2024) haven't come across any distfiles with weird chars 
	# in them thankfully.
	echo "${realf}" | ${GREP} -Evq '[A-Za-z0-9_:@%+/.-]'
	r=$?

	if [ ${r} -eq 0 ]; then
	MSG_TO_SHOW="found unsupported characters in the (real) file path."
	return 224
	fi

	# Make sure we have been given an xz file.
	# the -e arguments exclude tests we're not interested in, hopefully some tiny perf gain
	# but more importantly reduce attack surface.
	# Also we hae it output the mime type rather than a human readable string, more reliable
	${FILE_CMD} -e ascii -e cdf -e apptype -e csv -e elf -e json -e simh -e tar --mime ${realf} 2>/dev/null | ${GREP} -q 'application/x-xz;'
	r=$?

	if [ ${r} -ne 0 ]; then
	MSG_TO_SHOW="file supplied does not appear to be an xz file, according to the file command"
	return 234
	fi

	# initialise this string for the 7za stdout option (-so). Empty (no stdout) by default.
	STDOUT_OPT_STR=''
	# and this string to, by default, redirect 7za output to /dev/null, as it is somewhat
	# chatty when decompressing.
	STDOUT_REDIR_STR='>/dev/null'

	# If the uncompressed data should be sent to stdout then the above vars need to be changed.
	if [ "${STDOUT_REQUESTED}" = "Y" ]; then
	STDOUT_OPT_STR='-so'
	STDOUT_REDIR_STR=''
	fi

	# we currently set stderr to redirect to /dev/null always.
	STDERR_REDIR_STR='2>/dev/null'

	SEVEN_ZA_FULL_CMD="${SEVEN_ZA} e ${realf} ${STDOUT_OPT_STR} -bd ${STDOUT_REDIR_STR} ${STDERR_REDIR_STR}"

	MSG="${LOG_PREFIX} About to run: ${SEVEN_ZA_FULL_CMD}"
	${LOGGER} -p syslog.info -t "${0}" "${MSG}"

	# In 99% of cases we will redirect 7za decompressed output to stdout (-so).
	# So make really sure nothing gets output to stdout by this script after this point!
	# Log messages all only to logger.
	eval "${SEVEN_ZA_FULL_CMD}"
	return $?

}

last_r=0
LAST_ERR_MSG=

# if no file was detected we assume we will get compressed data via stdin
if [ -z "${f_passed_to_script}" ]; then

	# We create a temp file to save the stdin compressed data into as the current p7zip provided 
	# 7za does not work properly if fed data via stdin, though according to docs it
	# should work, so probably a bug.
	# Also note within sandbox tmp space is not the system /tmp AFAICT.
	# Unfortunately this means we need to make sure we have enough space for the uncompressed
	# data both in /tmp as well as whichever filesystem we set portage to do builds on.

	mytf=`${MKTEMP_CMD}`
	sleep 0.3

	# sanity
	if [ -e "${mytf}" ]; then

		${CHMOD} 0600 "${mytf}"

	else

		MSG="${LOG_PREFIX} Fatal Error: something has gone very wrong, no temp file exists!"
		${LOGGER} -p syslog.err -t "${0}" "${MSG}"
		exit 1

	fi

	# save stdin to our tmpfile. quotes shld not be needed but, what the hell, might as well.
	cat > "${mytf}"
	r=$?

	# sanity
	if [ ${r} -ne 0 ]; then

		MSG="${LOG_PREFIX} Fatal Error: cat returned non-zero code of ${r} when redirecting to ${mytf}!"
		${LOGGER} -p syslog.err -t "${0}" "${MSG}"
		exit 1

	else

		# Even if stdout was NOT requested using a command line argument (and thus STDOUT_REQUESTED will be set to N),
		# xz assumes that you *do* want the uncompressed stream to go to stdout if no file was given on the command line (naturally).
		# So we need to force this to Y here to make sure that happens.
		STDOUT_REQUESTED=Y

		# Having saved stdin to the tmp file above we can now have 7za decompress said tmp file.
		# Remember from here on we run 7za which will in most cases output binary data to stdout.
		# So make REALLY sure nothing else gets output after this!
		# Log messages all only to logger.
		do_uncompress "${mytf}"
		last_r=$?

		# error message numbers inside do_uncompress(), hopefully none clash with those used by 7za
		if [ ${last_r} = 184 ] || [ ${last_r} = 194 ] || [ ${last_r} = 204 ] || [ ${last_r} = 214 ] || [ ${last_r} = 224 ] || [ ${last_r} = 234 ]; then

			LAST_ERR_MSG="do_uncompress(): ${MSG_TO_SHOW}"

		elif [ ${last_r} -ne 0 ]; then

			LAST_ERR_MSG="7za returned non-zero code of ${last_r} when trying to decompress stdin!"

		fi

	fi
	
	# this is our created temp file rather than one supplied to the script so shld be deleted
	rm -f "${mytf}"

elif [ -e "${f_passed_to_script}" ]; then

	# Remember from here on we run 7za which will in most cases output binary data to stdout.
	# So make REALLY sure nothing else gets output after this!
	# Log messages all only to logger.
	do_uncompress "${f_passed_to_script}"
	last_r=$?

	# error message numbers inside do_uncompress(), hopefully none clash with those used by 7za
	if [ ${last_r} = 184 ] || [ ${last_r} = 194 ] || [ ${last_r} = 204 ] || [ ${last_r} = 214 ] || [ ${last_r} = 224 ] || [ ${last_r} = 234 ]; then

		LAST_ERR_MSG="do_uncompress(): ${MSG_TO_SHOW}"

	elif [ ${last_r} -ne 0 ]; then

		LAST_ERR_MSG="7za returned non-zero code of ${last_r} when trying to decompress the file!"

	fi

	# 7za does not delete the original file by default but xz does
	# If stdout was not requested then they will be expecting us to delete it so do that.
	# TO-DO: catch --keep argument and do not delete if it is passed
	if [ "${STDOUT_REQUESTED}" = "N" ]; then
	rm -f "${f_passed_to_script}"
	fi

else

	last_r=1
	LAST_ERR_MSG="no valid file was found in supplied arguments and stdin was not an xz stream!"

fi

if [ ${last_r} -ne 0 ]; then

	${LOGGER} -p syslog.err -t "${0}" "${LOG_PREFIX} Fatal Error: ${LAST_ERR_MSG}" >/dev/null 2>&1
	exit 1

else

	exit 0

fi


  reply	other threads:[~2024-04-06 11:57 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-30  3:07 [gentoo-dev] Current unavoidable use of xz utils in Gentoo Eddie Chapman
2024-03-30  3:43 ` orbea
2024-03-30  7:06   ` Dale
2024-03-30 10:47     ` [gentoo-dev] " Duncan
2024-03-30 11:32     ` [gentoo-dev] " Rich Freeman
2024-03-30 14:57       ` Eddie Chapman
2024-03-30 15:02         ` Michał Górny
2024-03-30 15:17           ` Eddie Chapman
2024-03-30 15:29             ` Michał Górny
2024-03-30 15:59               ` Eddie Chapman
2024-03-30 16:07             ` Dale
2024-03-30 17:13             ` Re[2]: " Stefan Schmiedl
2024-03-30 17:36               ` Eddie Chapman
2024-03-31  1:41                 ` Thomas Gall
2024-03-30 23:49             ` Eddie Chapman
2024-03-31  1:36             ` Eli Schwartz
2024-03-30 15:23           ` orbea
2024-03-30 15:14         ` Rich Freeman
2024-03-30 17:19           ` Eddie Chapman
2024-03-31  1:25 ` Sam James
2024-03-31  1:33 ` Eli Schwartz
2024-03-31 11:13   ` Eddie Chapman
2024-03-31 11:59     ` Matt Jolly
2024-04-01  7:57       ` Eddie Chapman
2024-04-01 14:50         ` Eli Schwartz
2024-04-02  8:43           ` Eddie Chapman
2024-04-02 19:46             ` Eli Schwartz
2024-04-02 20:19               ` Eddie Chapman
2024-04-01 14:55         ` Michał Górny
2024-04-02  9:02           ` Eddie Chapman
2024-04-01 15:14     ` Kenton Groombridge
2024-04-01 15:40       ` orbea
2024-04-01 16:01         ` Kenton Groombridge
2024-04-01 16:21           ` orbea
2024-04-01 18:51             ` Kévin GASPARD DE RENEFORT
2024-04-01 20:07               ` James Le Cuirot
2024-04-02  6:32                 ` Joonas Niilola
2024-03-31 11:32   ` stefan11111
2024-04-01 14:56 ` Azamat Hackimov
2024-04-02 19:32   ` Eddie Chapman
2024-04-03 11:47     ` [gentoo-dev] " Duncan
2024-04-03 12:14       ` Sam James
2024-04-03 15:30         ` [gentoo-dev] " Eddie Chapman
2024-04-03 16:40           ` Michael Orlitzky
2024-04-04  3:20             ` [gentoo-dev] " Duncan
2024-04-04  3:49           ` [gentoo-dev] " Eli Schwartz
2024-04-04  8:32             ` Sam James
2024-04-04  8:34               ` Kévin GASPARD DE RENEFORT
2024-04-04 14:38               ` Eddie Chapman
2024-04-04 14:24             ` Eddie Chapman
2024-04-06 11:57               ` Eddie Chapman [this message]
2024-04-06 12:15                 ` Ulrich Mueller
2024-04-06 12:34                 ` Roy Bamford
2024-04-06 14:04                 ` Fabian Groffen
2024-04-07  6:44                   ` Eddie Chapman
2024-04-06 16:15                 ` Sam James
2024-04-07 11:24                   ` Eddie Chapman
2024-04-11  5:21                 ` Joonas Niilola
2024-04-12  7:18                   ` [gentoo-dev] " Duncan
2024-04-13  7:10                   ` [gentoo-dev] " Eddie Chapman
2024-04-03 12:22       ` [gentoo-dev] " Kévin GASPARD DE RENEFORT
2024-04-03 12:26         ` Kévin GASPARD DE RENEFORT
2024-04-04  1:41         ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92ef54a0-7a49-49f3-b3cc-d38a2b9adebd@ehuk.net \
    --to=eddie@ehuk.net \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox