public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] RFC: using .xz for doc/man/info compression
@ 2014-05-11 17:46 Michał Górny
  2014-05-11 19:37 ` Alexander Tsoy
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Michał Górny @ 2014-05-11 17:46 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2566 bytes --]

Hello, developers.

I'd like to raise the following item for discussion: making .xz
the default compressor used by portage for documentation, man pages
and info files. That is, the equivalent of:

  PORTAGE_COMPRESS=xz

in make.globals.

Rationale: xz-utils is quite widespread nowadays and it is a part
of @system set. It can achieve better compression ratio than bzip2,
and faster decompression at the same time.

I have confirmed that both sys-apps/man and sys-apps/man-db can
handle .xz compressed man pages, and sys-apps/texinfo can handle .xz
compressed info pages. Major text editors and pagers support .xz
alike .bz2 (i.e. usually they support both or neither :)).

The additional question is: what preset to use? To help discussing
this, I'd like to quote the tables from 'man xz':

     Preset   DictSize   CompCPU   CompMem   DecMem
       -0     256 KiB       0        3 MiB    1 MiB
       -1       1 MiB       1        9 MiB    2 MiB
       -2       2 MiB       2       17 MiB    3 MiB
       -3       4 MiB       3       32 MiB    5 MiB
       -4       4 MiB       4       48 MiB    5 MiB
       -5       8 MiB       5       94 MiB    9 MiB
       -6       8 MiB       6       94 MiB    9 MiB
       -7      16 MiB       6      186 MiB   17 MiB
       -8      32 MiB       6      370 MiB   33 MiB
       -9      64 MiB       6      674 MiB   65 MiB 

     Preset   DictSize   CompCPU   CompMem   DecMem
      -0e     256 KiB       8        4 MiB    1 MiB
      -1e       1 MiB       8       13 MiB    2 MiB
      -2e       2 MiB       8       25 MiB    3 MiB
      -3e       4 MiB       7       48 MiB    5 MiB
      -4e       4 MiB       8       48 MiB    5 MiB
      -5e       8 MiB       7       94 MiB    9 MiB
      -6e       8 MiB       8       94 MiB    9 MiB
      -7e      16 MiB       8      186 MiB   17 MiB
      -8e      32 MiB       8      370 MiB   33 MiB
      -9e      64 MiB       8      674 MiB   65 MiB

I'd like to note here that increasing dictionary size over file size
does not improve compression. However, the options involved in CompCPU
may.

Depending on the expected amount of complexity, I'd either go for:

1) -6e (or -6, the default) -- max CompCPU, reasonable use of memory,
and dictionary larger than most (or all?) documents that are going to
be compressed,

2) -Ne with minimal 'N' for CompCPU==8 and DictSize > filesize -- still
max compression ratio while keeping lowest memory requirements possible.

Your thoughts?

-- 
Best regards,
Michał Górny

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 966 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-05-14 17:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-11 17:46 [gentoo-dev] RFC: using .xz for doc/man/info compression Michał Górny
2014-05-11 19:37 ` Alexander Tsoy
2014-05-11 21:27 ` Pacho Ramos
2014-05-11 23:26   ` Gordon Pettey
2014-05-12 10:47     ` Alexander Tsoy
2014-05-12 10:55       ` Alexander Tsoy
2014-05-12 12:17       ` Tom Wijsman
2014-05-12 12:40         ` Alexander Tsoy
2014-05-12 22:55       ` Gordon Pettey
2014-05-13  5:01       ` Andrew Savchenko
2014-05-13  5:55         ` Ulrich Mueller
2014-05-13 11:01           ` Andrew Savchenko
2014-05-13 12:18             ` Rich Freeman
2014-05-13 13:42               ` Ulrich Mueller
2014-05-14 13:42                 ` Andreas K. Huettel
2014-05-14 14:01                   ` Ulrich Mueller
2014-05-13 17:27               ` [gentoo-dev] " Duncan
2014-05-14  2:38               ` [gentoo-dev] " Andrew Savchenko
2014-05-14 13:16             ` vivo75
2014-05-12  9:31   ` Marcin Mirosław
2014-05-12  9:45     ` Tom Wijsman
2014-05-12  3:24 ` Samuli Suominen
2014-05-12  9:35 ` Tom Wijsman
2014-05-13  2:08   ` Andrew Savchenko
2014-05-13 16:33     ` Tom Wijsman
2014-05-14  3:29   ` Kent Fredric
2014-05-14 16:53   ` Roy Bamford
2014-05-14 17:59     ` Rich Freeman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox