From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 169EB1389FE for ; Fri, 31 Oct 2014 19:23:26 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id EC669E08EE; Fri, 31 Oct 2014 19:23:18 +0000 (UTC) Received: from mail-qc0-f169.google.com (mail-qc0-f169.google.com [209.85.216.169]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id AF733E0854 for ; Fri, 31 Oct 2014 19:23:17 +0000 (UTC) Received: by mail-qc0-f169.google.com with SMTP id i17so6537881qcy.14 for ; Fri, 31 Oct 2014 12:23:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=6wgRHbFSCFYech1GJhRzf+q3GiSQBLLdsBBdvapH4jU=; b=RGIzn4nFPfwquKf+UJ2gvGP0wZ84FXlw+9Mt/N6pgDlCxGuXhz+zrCePAH2vrk2IRr 0/g/DGuOxoBMBxFBEwQsG3O0RasMBotDhncPaotYl6oI1XSIe6jAkL73HMPjKPpQQKJf CVkEkgK6LhwwPzkwb+f+cGVr82EW/jvD8tl8gmrMpuDFNHkFWN1ffE2qTXARws8JrJ/i 6vZnfLwXWGCp/rijUZgR6xLfpQpUgHEqf+WxFnhcTqmxkq82XOtPVC6jn7xQfva4tgSC fRjrn3GomODlATVjDidwVJDfzw7AXmV6MrvcggYl+MxePXJjU8JVLBv4nV4B0hVAeYkp zg0w== Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 X-Received: by 10.140.82.144 with SMTP id h16mr37045240qgd.40.1414783396817; Fri, 31 Oct 2014 12:23:16 -0700 (PDT) Sender: freemanrich@gmail.com Received: by 10.140.102.134 with HTTP; Fri, 31 Oct 2014 12:23:16 -0700 (PDT) In-Reply-To: <20141031185545.GA536@grusum.endjinn.de> References: <20141031153659.GA13217@solfire> <5453AE7D.8060505@ramses-pyramidenbau.de> <20141031155917.GB13217@solfire> <20141031185545.GA536@grusum.endjinn.de> Date: Fri, 31 Oct 2014 15:23:16 -0400 X-Google-Sender-Auth: A9sbJhcNTcFCW-0NzG8HMU6_uZg Message-ID: Subject: Re: [gentoo-user] OT Best way to compress files with digits From: Rich Freeman To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 X-Archives-Salt: 4ca626ba-8245-4b5b-84aa-338313bb7155 X-Archives-Hash: 06bb9247400b5ffe951a91400f9b92e4 On Fri, Oct 31, 2014 at 2:55 PM, David Haller wrote: > > On Fri, 31 Oct 2014, Rich Freeman wrote: > >>I can't imagine that any tool will do much better than something like >>lzo, gzip, xz, etc. You'll definitely benefit from compression though >>- your text files full of digits are encoding 3.3 bits of information >>in an 8-bit ascii character and even if the order of digits in pi can >>be treated as purely random just about any compression algorithm is >>going to get pretty close to that 3.3 bits per digit figure. > > Good estimate: > > $ calc '101000/(8/3.3)' > 41662.5 > and I get from (lzip) > $ calc 44543*8/101000 > 3.528... (bits/digit) > to zip: > $ calc 49696*8/101000 > ~3.93 (bits/digit) Actually, I'm surprised how far off of this the various methods are. I was expecting SOME overhead, but not this much. A fairly quick algorithm would be to encode every possible set of 96 digits into a 40 byte code (that is just a straight decimal-binary conversion). Then read a "word" at a time and translate it. This will only waste 0.011 bits per digit. -- Rich