public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] OT Best way to compress files with digits
@ 2014-10-31 15:36 meino.cramer
  2014-10-31 15:45 ` Ralf
  2014-11-01 17:15 ` James
  0 siblings, 2 replies; 22+ messages in thread
From: meino.cramer @ 2014-10-31 15:36 UTC (permalink / raw
  To: Gentoo

 Hi,

 I have a lot of files with digits of PI. The digits
 are the characters of 0-9. Currently they are ZIPped,
 which I think is not the best way to do that.

 I read of 7zips PPMd which compresses "natural text"
 quite well...but my files are not "natural text" (as
 they are also no "binary data").

 With what practical way of compression is it possible
 to compress the files (file by file) as much as possible?

 Thank you very much in advance for any help!

 Best regards,
 mcc




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] OT Best way to compress files with digits
  2014-10-31 15:36 [gentoo-user] OT Best way to compress files with digits meino.cramer
@ 2014-10-31 15:45 ` Ralf
  2014-10-31 15:59   ` meino.cramer
  2014-11-01 17:15 ` James
  1 sibling, 1 reply; 22+ messages in thread
From: Ralf @ 2014-10-31 15:45 UTC (permalink / raw
  To: gentoo-user

Well, you could just save the generating algorithm. *scnr*

I think compressing pi is hardly possible, as the numbers are
distributed pretty randomly.
But why do you want to compress? You can't work on compressed data.
And there are enough sites on the internet, where you can get your
digits again.

Pi is not supposed to change over the years :-)

Cheers
  Ralf

On 31.10.2014 17:36, meino.cramer@gmx.de wrote:
>  Hi,
>
>  I have a lot of files with digits of PI. The digits
>  are the characters of 0-9. Currently they are ZIPped,
>  which I think is not the best way to do that.
>
>  I read of 7zips PPMd which compresses "natural text"
>  quite well...but my files are not "natural text" (as
>  they are also no "binary data").
>
>  With what practical way of compression is it possible
>  to compress the files (file by file) as much as possible?
>
>  Thank you very much in advance for any help!
>
>  Best regards,
>  mcc
>
>
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] OT Best way to compress files with digits
  2014-10-31 15:45 ` Ralf
@ 2014-10-31 15:59   ` meino.cramer
  2014-10-31 16:52     ` Helmut Jarausch
  2014-10-31 17:56     ` Rich Freeman
  0 siblings, 2 replies; 22+ messages in thread
From: meino.cramer @ 2014-10-31 15:59 UTC (permalink / raw
  To: gentoo-user

Ralf <ralf+gentoo@ramses-pyramidenbau.de> [14-10-31 16:48]:
> Well, you could just save the generating algorithm. *scnr*
> 
> I think compressing pi is hardly possible, as the numbers are
> distributed pretty randomly.
> But why do you want to compress? You can't work on compressed data.
> And there are enough sites on the internet, where you can get your
> digits again.
> 
> Pi is not supposed to change over the years :-)
> 
> Cheers
>   Ralf
> 
> On 31.10.2014 17:36, meino.cramer@gmx.de wrote:
> >  Hi,
> >
> >  I have a lot of files with digits of PI. The digits
> >  are the characters of 0-9. Currently they are ZIPped,
> >  which I think is not the best way to do that.
> >
> >  I read of 7zips PPMd which compresses "natural text"
> >  quite well...but my files are not "natural text" (as
> >  they are also no "binary data").
> >
> >  With what practical way of compression is it possible
> >  to compress the files (file by file) as much as possible?
> >
> >  Thank you very much in advance for any help!
> >
> >  Best regards,
> >  mcc
> >
> >
> >
> 
> 
Hi Ralf,

I have a damn slow Internet connection and searching through
millions of digits is not always provided. Despite that: I want
to do more with that digits, I have to download them again and
again. Its better to get a copy of the 2014th version of PI for
later reference local on my hd.

I am currently checking the compression tools I know of for the
best compression ration. But I will definitly miss those I dont
know...
And sometimes one can do magic with option and switches of that
kind of tools I also dont know of.

If someone has suggestions....always appreciated! :)

Best regards,
mcc




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] OT Best way to compress files with digits
  2014-10-31 15:59   ` meino.cramer
@ 2014-10-31 16:52     ` Helmut Jarausch
  2014-10-31 17:56     ` Rich Freeman
  1 sibling, 0 replies; 22+ messages in thread
From: Helmut Jarausch @ 2014-10-31 16:52 UTC (permalink / raw
  To: gentoo-user

On 10/31/2014 04:59:17 PM, meino.cramer@gmx.de wrote:
> If someone has suggestions....always appreciated! :)

It's best to ask on the news group comp.compression.
There are top international specialists.

Helmut



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] OT Best way to compress files with digits
  2014-10-31 15:59   ` meino.cramer
  2014-10-31 16:52     ` Helmut Jarausch
@ 2014-10-31 17:56     ` Rich Freeman
  2014-10-31 18:55       ` David Haller
  1 sibling, 1 reply; 22+ messages in thread
From: Rich Freeman @ 2014-10-31 17:56 UTC (permalink / raw
  To: gentoo-user

On Fri, Oct 31, 2014 at 11:59 AM,  <meino.cramer@gmx.de> wrote:
> I am currently checking the compression tools I know of for the
> best compression ration. But I will definitly miss those I dont
> know...
> And sometimes one can do magic with option and switches of that
> kind of tools I also dont know of.

I can't imagine that any tool will do much better than something like
lzo, gzip, xz, etc.  You'll definitely benefit from compression though
- your text files full of digits are encoding 3.3 bits of information
in an 8-bit ascii character and even if the order of digits in pi can
be treated as purely random just about any compression algorithm is
going to get pretty close to that 3.3 bits per digit figure.

--
Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] OT Best way to compress files with digits
  2014-10-31 17:56     ` Rich Freeman
@ 2014-10-31 18:55       ` David Haller
  2014-10-31 19:23         ` Rich Freeman
  0 siblings, 1 reply; 22+ messages in thread
From: David Haller @ 2014-10-31 18:55 UTC (permalink / raw
  To: gentoo-user

Hello,

On Fri, 31 Oct 2014, Rich Freeman wrote:
>On Fri, Oct 31, 2014 at 11:59 AM,  <meino.cramer@gmx.de> wrote:
>> I am currently checking the compression tools I know of for the
>> best compression ration. But I will definitly miss those I dont
>> know...
>> And sometimes one can do magic with option and switches of that
>> kind of tools I also dont know of.

With 100k pseudo-random digits from bash's $RANDOM % 10 and a
linebreak every 100 digits (in t.lst) I get this (each with --best /
-9 / -m5 (rar) compression-level option):

$ du -b * | sort -rn
101000  t.lst
61544   t.lzop
50733   t.zoo
49696   t.zip
49609   t.lha
49554   t.gz
48907   t.Z
44942   t.rar
44661   t.rzip
44638   t.7z
44592   t.xz
44572   t.bz2
44546   t.lzma
44543   t.lzip

What I find remarkable is that both gzip and good old compress (.Z)
are rather good ;) And above is probably a quite comprehensible list,
and except .Z, .gz and .bz2 all are name as the binaries used to
create them.

I'd use bzip2/xz/lz as there are e.g. [blx]z(e)(grep|cat|less), but
not e.g. 7zgrep, and I guess they can easy access to those archives
quite a bit.

>I can't imagine that any tool will do much better than something like
>lzo, gzip, xz, etc.  You'll definitely benefit from compression though
>- your text files full of digits are encoding 3.3 bits of information
>in an 8-bit ascii character and even if the order of digits in pi can
>be treated as purely random just about any compression algorithm is
>going to get pretty close to that 3.3 bits per digit figure.

Good estimate:

$ calc '101000/(8/3.3)'
        41662.5
and I get from (lzip)
$ calc 44543*8/101000 
        3.528...        (bits/digit)
to zip:
$ calc 49696*8/101000
        ~3.93           (bits/digit)

HTH,
-dnh

-- 
Q: Hobbies?
A: Hating music.            -- Marvin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] OT Best way to compress files with digits
  2014-10-31 18:55       ` David Haller
@ 2014-10-31 19:23         ` Rich Freeman
  2014-10-31 20:25           ` [gentoo-user] " Grant Edwards
  0 siblings, 1 reply; 22+ messages in thread
From: Rich Freeman @ 2014-10-31 19:23 UTC (permalink / raw
  To: gentoo-user

On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@dhaller.de> wrote:
>
> On Fri, 31 Oct 2014, Rich Freeman wrote:
>
>>I can't imagine that any tool will do much better than something like
>>lzo, gzip, xz, etc.  You'll definitely benefit from compression though
>>- your text files full of digits are encoding 3.3 bits of information
>>in an 8-bit ascii character and even if the order of digits in pi can
>>be treated as purely random just about any compression algorithm is
>>going to get pretty close to that 3.3 bits per digit figure.
>
> Good estimate:
>
> $ calc '101000/(8/3.3)'
>         41662.5
> and I get from (lzip)
> $ calc 44543*8/101000
>         3.528...        (bits/digit)
> to zip:
> $ calc 49696*8/101000
>         ~3.93           (bits/digit)

Actually, I'm surprised how far off of this the various methods are.
I was expecting SOME overhead, but not this much.

A fairly quick algorithm would be to encode every possible set of 96
digits into a 40 byte code (that is just a straight decimal-binary
conversion).  Then read a "word" at a time and translate it.  This
will only waste 0.011 bits per digit.

--
Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [gentoo-user] Re: OT Best way to compress files with digits
  2014-10-31 19:23         ` Rich Freeman
@ 2014-10-31 20:25           ` Grant Edwards
  2014-10-31 22:22             ` Rich Freeman
  0 siblings, 1 reply; 22+ messages in thread
From: Grant Edwards @ 2014-10-31 20:25 UTC (permalink / raw
  To: gentoo-user

On 2014-10-31, Rich Freeman <rich0@gentoo.org> wrote:
> On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@dhaller.de> wrote:
>>
>> On Fri, 31 Oct 2014, Rich Freeman wrote:
>>
>>>I can't imagine that any tool will do much better than something like
>>>lzo, gzip, xz, etc.  You'll definitely benefit from compression though
>>>- your text files full of digits are encoding 3.3 bits of information
>>>in an 8-bit ascii character and even if the order of digits in pi can
>>>be treated as purely random just about any compression algorithm is
>>>going to get pretty close to that 3.3 bits per digit figure.
>>
>> Good estimate:
>>
>> $ calc '101000/(8/3.3)'
>>         41662.5
>> and I get from (lzip)
>> $ calc 44543*8/101000
>>         3.528...        (bits/digit)
>> to zip:
>> $ calc 49696*8/101000
>>         ~3.93           (bits/digit)
>
> Actually, I'm surprised how far off of this the various methods are.
> I was expecting SOME overhead, but not this much.
>
> A fairly quick algorithm would be to encode every possible set of 96
> digits into a 40 byte code (that is just a straight decimal-binary
> conversion).  Then read a "word" at a time and translate it.  This
> will only waste 0.011 bits per digit.

You're cheating.  The algorithm you tested will compress strings of
arbitrary 8-bit values.  The algorithm you proposed will only compress
strings of bytes where each byte can have only one of 10 values.

-- 
Grant Edwards               grant.b.edwards        Yow! I want another
                                  at               RE-WRITE on my CEASAR
                              gmail.com            SALAD!!



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-10-31 20:25           ` [gentoo-user] " Grant Edwards
@ 2014-10-31 22:22             ` Rich Freeman
  0 siblings, 0 replies; 22+ messages in thread
From: Rich Freeman @ 2014-10-31 22:22 UTC (permalink / raw
  To: gentoo-user

On Fri, Oct 31, 2014 at 4:25 PM, Grant Edwards
<grant.b.edwards@gmail.com> wrote:
>
> You're cheating.  The algorithm you tested will compress strings of
> arbitrary 8-bit values.  The algorithm you proposed will only compress
> strings of bytes where each byte can have only one of 10 values.
>

Of course.  I wasn't expecting the general-purpose algorithm to do as
well.  In some sense, part of the information that is being encoded is
actually in the compression algorithm itself (the mapping), while in a
general-purpose compression algorithm that information has to be part
of the compressed data stream.

I was just expecting gzip/etc to get much closer to the theoretical
limit.  I figured that it might be a few percent higher, but I wasn't
expecting a 10+% difference.

--
Rich


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [gentoo-user] Re: OT Best way to compress files with digits
  2014-10-31 15:36 [gentoo-user] OT Best way to compress files with digits meino.cramer
  2014-10-31 15:45 ` Ralf
@ 2014-11-01 17:15 ` James
  2014-11-01 17:26   ` Alan McKinnon
  2014-11-01 17:59   ` meino.cramer
  1 sibling, 2 replies; 22+ messages in thread
From: James @ 2014-11-01 17:15 UTC (permalink / raw
  To: gentoo-user

 <meino.cramer <at> gmx.de> writes:


>  I have a lot of files with digits of PI. The digits
>  are the characters of 0-9. Currently they are ZIPped,
>  which I think is not the best way to do that.

Hello Meino,

It's a bit of effort, but the world's recognized authority
on algorithms is Don Knuth. [1] He's old now, but his
pioneering attempt at categorizing most algorithms:
"The art of computer programming" and his MMIX alogrithm
implementations (kinda like assembler) are certainly
part of many first-step research efforts on algorithms
and their implementations.

It's not a cookbook; more of a scholarly (high_brow) reference,
just to supplement all the good postings by your peers on gentoo user.

Alan may loan you his copy?
(ha ha ha)?



hth,
James

[1] http://www-cs-faculty.stanford.edu/~uno/






^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-01 17:15 ` James
@ 2014-11-01 17:26   ` Alan McKinnon
  2014-11-01 20:18     ` Matti Nykyri
  2014-11-01 17:59   ` meino.cramer
  1 sibling, 1 reply; 22+ messages in thread
From: Alan McKinnon @ 2014-11-01 17:26 UTC (permalink / raw
  To: gentoo-user

On 01/11/2014 19:15, James wrote:
>  <meino.cramer <at> gmx.de> writes:
> 
> 
>>  I have a lot of files with digits of PI. The digits
>>  are the characters of 0-9. Currently they are ZIPped,
>>  which I think is not the best way to do that.
> 
> Hello Meino,
> 
> It's a bit of effort, but the world's recognized authority
> on algorithms is Don Knuth. [1] He's old now, but his
> pioneering attempt at categorizing most algorithms:
> "The art of computer programming" and his MMIX alogrithm
> implementations (kinda like assembler) are certainly
> part of many first-step research efforts on algorithms
> and their implementations.
> 
> It's not a cookbook; more of a scholarly (high_brow) reference,
> just to supplement all the good postings by your peers on gentoo user.
> 
> Alan may loan you his copy?
> (ha ha ha)?
> 
> 
> 
> hth,
> James
> 
> [1] http://www-cs-faculty.stanford.edu/~uno/


ha ha, fat chance :-)

When Alan does eventually get his hands on his very own personal
copy[1], it will be lent to nobody. There are just some things a man
never lends out: his bike, his firearm, his wife. And Knuth :-)

Back on topic: You're 100% right - to learn about algorithms in general,
Knuth is the man. Essential reading for anyone taking CS seriously

-- 
Alan McKinnon
alan.mckinnon@gmail.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-01 17:15 ` James
  2014-11-01 17:26   ` Alan McKinnon
@ 2014-11-01 17:59   ` meino.cramer
  2014-11-01 20:47     ` Alan McKinnon
  1 sibling, 1 reply; 22+ messages in thread
From: meino.cramer @ 2014-11-01 17:59 UTC (permalink / raw
  To: gentoo-user

James <wireless@tampabay.rr.com> [14-11-01 18:16]:
>  <meino.cramer <at> gmx.de> writes:
> 
> 
> >  I have a lot of files with digits of PI. The digits
> >  are the characters of 0-9. Currently they are ZIPped,
> >  which I think is not the best way to do that.
> 
> Hello Meino,
> 
> It's a bit of effort, but the world's recognized authority
> on algorithms is Don Knuth. [1] He's old now, but his
> pioneering attempt at categorizing most algorithms:
> "The art of computer programming" and his MMIX alogrithm
> implementations (kinda like assembler) are certainly
> part of many first-step research efforts on algorithms
> and their implementations.
> 
> It's not a cookbook; more of a scholarly (high_brow) reference,
> just to supplement all the good postings by your peers on gentoo user.
> 
> Alan may loan you his copy?
> (ha ha ha)?
> 
> 
> 
> hth,
> James
> 
> [1] http://www-cs-faculty.stanford.edu/~uno/
> 

Hello james,

Don Knuth ... oh YES! :)
For a long time I am using and prefering TeX over anything else
(ok...for ASCII I use vim... ;).

And beside his computer wisdom I also like his kind of humor a lot...
for example this one:
https://www.youtube.com/watch?v=eKaI78K_rgA&list=PLUu0XRts4lK6Ri7-xaCNYqTHx7We95Rk8&index=10

But my initial question was more targeted to "practical computing" as
to groundshakeing and fundamental research topics.

More like "what tool to pick?"...

I did some compression tests myself and currently I have this:
From http://piworld.calico.jp/ (http://piworld.calico.jp/estart.html)
I got zipped package of
1000 million places of PI each (~57MB for one ZIP).

I unpacked the first package and recompressed it with different
methods of 7zip, gzip and bzip2. For gzip and bzip2 I used the highest
compression mode (-9). When a files name matches /.*ultra.*/, I used
the highest compression mode (-mx=9), else I only set the compression
method and leave the rest untouched (defaults).


 119888896 2014-10-31 16:44 pi-0001.txt
  57105419 2014-10-31 16:47 pi-0001.txt.gz
  52632832 2014-10-31 16:48 pi-0001.txt.bz2
  52045827 2014-10-31 16:54 pi-0001.txt.ppmd.7z
  57110291 2014-10-31 17:23 pi-0001.zip
  51766683 2014-10-31 17:26 pi-0001.txt.lzma.7z
  51668838 2014-10-31 17:34 pi-0001.txt.lzma.ultra.7z
  52862115 2014-10-31 17:36 pi-0001.txt.ppmd.ultra.7z
  51668838 2014-10-31 17:39 pi-0001.txt.ultra.7z

7zip's lzma wins here, which is also the default method of 7zip. I set
the ultra mode for this by hand.

From other sites which offer PI for download I know of methods, which
store the ASCII-digits in binary and compresses then. Would be
interesting, whether this creates a more "handy" input from 7zips
point of view...

Ah! By the way...I was astonished to read, that the digits of PI are
called random on the one hand and on the other hand there is a formula [1] 
to calculate a certain digit of PI without calculation of the previous
digits...
Calculated random? Are nature constants the purest form of PRNGs ??? ;)
(Quantum physics is everywhere... ;;))

[1]: http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula

Best regards,
Meino









^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-01 17:26   ` Alan McKinnon
@ 2014-11-01 20:18     ` Matti Nykyri
  0 siblings, 0 replies; 22+ messages in thread
From: Matti Nykyri @ 2014-11-01 20:18 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

> On Nov 1, 2014, at 19:26, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
> 
>> On 01/11/2014 19:15, James wrote:
>> <meino.cramer <at> gmx.de> writes:
>> 
>> 
>>> I have a lot of files with digits of PI. The digits
>>> are the characters of 0-9. Currently they are ZIPped,
>>> which I think is not the best way to do that.
>> 
>> Hello Meino,
>> 
>> It's a bit of effort, but the world's recognized authority
>> on algorithms is Don Knuth. [1] He's old now, but his
>> pioneering attempt at categorizing most algorithms:
>> "The art of computer programming" and his MMIX alogrithm
>> implementations (kinda like assembler) are certainly
>> part of many first-step research efforts on algorithms
>> and their implementations.
>> 
>> It's not a cookbook; more of a scholarly (high_brow) reference,
>> just to supplement all the good postings by your peers on gentoo user.
>> 
>> Alan may loan you his copy?
>> (ha ha ha)?
>> 
>> 
>> 
>> hth,
>> James
>> 
>> [1] http://www-cs-faculty.stanford.edu/~uno/
> 
> 
> ha ha, fat chance :-)
> 
> When Alan does eventually get his hands on his very own personal
> copy[1], it will be lent to nobody. There are just some things a man
> never lends out: his bike, his firearm, his wife. And Knuth :-)

Why not lend your wife? ;)

> Back on topic: You're 100% right - to learn about algorithms in general,
> Knuth is the man. Essential reading for anyone taking CS seriously
> 
> -- 
> Alan McKinnon
> alan.mckinnon@gmail.com
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-01 17:59   ` meino.cramer
@ 2014-11-01 20:47     ` Alan McKinnon
  2014-11-01 21:56       ` David W Noon
  0 siblings, 1 reply; 22+ messages in thread
From: Alan McKinnon @ 2014-11-01 20:47 UTC (permalink / raw
  To: gentoo-user

On 01/11/2014 19:59, meino.cramer@gmx.de wrote:
> James <wireless@tampabay.rr.com> [14-11-01 18:16]:
>>  <meino.cramer <at> gmx.de> writes:
>>
>>
>>>  I have a lot of files with digits of PI. The digits
>>>  are the characters of 0-9. Currently they are ZIPped,
>>>  which I think is not the best way to do that.
>>
>> Hello Meino,
>>
>> It's a bit of effort, but the world's recognized authority
>> on algorithms is Don Knuth. [1] He's old now, but his
>> pioneering attempt at categorizing most algorithms:
>> "The art of computer programming" and his MMIX alogrithm
>> implementations (kinda like assembler) are certainly
>> part of many first-step research efforts on algorithms
>> and their implementations.
>>
>> It's not a cookbook; more of a scholarly (high_brow) reference,
>> just to supplement all the good postings by your peers on gentoo user.
>>
>> Alan may loan you his copy?
>> (ha ha ha)?
>>
>>
>>
>> hth,
>> James
>>
>> [1] http://www-cs-faculty.stanford.edu/~uno/
>>
> 
> Hello james,
> 
> Don Knuth ... oh YES! :)
> For a long time I am using and prefering TeX over anything else
> (ok...for ASCII I use vim... ;).
> 
> And beside his computer wisdom I also like his kind of humor a lot...
> for example this one:
> https://www.youtube.com/watch?v=eKaI78K_rgA&list=PLUu0XRts4lK6Ri7-xaCNYqTHx7We95Rk8&index=10
> 
> But my initial question was more targeted to "practical computing" as
> to groundshakeing and fundamental research topics.
> 
> More like "what tool to pick?"...
> 
> I did some compression tests myself and currently I have this:
>>From http://piworld.calico.jp/ (http://piworld.calico.jp/estart.html)
> I got zipped package of
> 1000 million places of PI each (~57MB for one ZIP).
> 
> I unpacked the first package and recompressed it with different
> methods of 7zip, gzip and bzip2. For gzip and bzip2 I used the highest
> compression mode (-9). When a files name matches /.*ultra.*/, I used
> the highest compression mode (-mx=9), else I only set the compression
> method and leave the rest untouched (defaults).
> 
> 
>  119888896 2014-10-31 16:44 pi-0001.txt
>   57105419 2014-10-31 16:47 pi-0001.txt.gz
>   52632832 2014-10-31 16:48 pi-0001.txt.bz2
>   52045827 2014-10-31 16:54 pi-0001.txt.ppmd.7z
>   57110291 2014-10-31 17:23 pi-0001.zip
>   51766683 2014-10-31 17:26 pi-0001.txt.lzma.7z
>   51668838 2014-10-31 17:34 pi-0001.txt.lzma.ultra.7z
>   52862115 2014-10-31 17:36 pi-0001.txt.ppmd.ultra.7z
>   51668838 2014-10-31 17:39 pi-0001.txt.ultra.7z
> 
> 7zip's lzma wins here, which is also the default method of 7zip. I set
> the ultra mode for this by hand.
> 
>>From other sites which offer PI for download I know of methods, which
> store the ASCII-digits in binary and compresses then. Would be
> interesting, whether this creates a more "handy" input from 7zips
> point of view...
> 
> Ah! By the way...I was astonished to read, that the digits of PI are
> called random on the one hand and on the other hand there is a formula [1] 
> to calculate a certain digit of PI without calculation of the previous
> digits...
> Calculated random? Are nature constants the purest form of PRNGs ??? ;)
> (Quantum physics is everywhere... ;;))
> 
> [1]: http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula


The sequence of digits that make up pi are a random sequence - you can
analyze the order any way you want and you'll find no inherent pattern.
However, any given digit in the sequence is 100% predictable, as you
just showed :-)

Randomness has got to be the second most mind-boggling thing out there,
first being quantumness (that's not a waord, I just made it up. You you
should get the meaning OK from context ;-) )

-- 
Alan McKinnon
alan.mckinnon@gmail.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-01 20:47     ` Alan McKinnon
@ 2014-11-01 21:56       ` David W Noon
  2014-11-02 12:06         ` Matti Nykyri
  2014-11-02 19:55         ` Alan McKinnon
  0 siblings, 2 replies; 22+ messages in thread
From: David W Noon @ 2014-11-01 21:56 UTC (permalink / raw
  To: gentoo-user

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon
(alan.mckinnon@gmail.com) wrote about "Re: [gentoo-user] Re: OT Best
way to compress files with digits" (in <545546D3.3030005@gmail.com>):

> On 01/11/2014 19:59, meino.cramer@gmx.de wrote:
[snip]
>> Ah! By the way...I was astonished to read, that the digits of PI
>> are called random on the one hand and on the other hand there is
>> a formula [1] to calculate a certain digit of PI without
>> calculation of the previous digits... Calculated random? Are
>> nature constants the purest form of PRNGs ??? ;) (Quantum physics
>> is everywhere... ;;))
>> 
>> [1]:
>> http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula
>
>> 
> 
> The sequence of digits that make up pi are a random sequence - you
> can analyze the order any way you want and you'll find no inherent
> pattern.

Actually, the sequence of digits is most definitely *not* random.  If
the sequence of digits is written any other way then the value is not
Pi.  Hence the sequence is unique, not random.

I think what you are grasping for is that the frequency of distinct
digits tends to be uniform: 0's occur as often as 1's as often ... as
9's.  Note that the "as often as" operator is really approximate for
finite sub-sequences, but is asymptotically accurate.

Moreover, this is the same in any number base: the binary
representation has 0's occurring as often as 1's; the ternary
representation has 0's occurring as often as 1' and as often as 2's;
etc., etc.

Such numbers are called "normal".  It was a poor choice of name, but
we are stuck with it.  I would have called them "digit soup" numbers
- -- an oblique reference to alphabet soup.

> However, any given digit in the sequence is 100% predictable, as
> you just showed :-)
> 
> Randomness has got to be the second most mind-boggling thing out
> there, first being quantumness (that's not a waord, I just made it
> up. You you should get the meaning OK from context ;-) )

I would say that probability theory is more mind boggling, as it
underpins much of quantum theory.  But, as someone who majored in
probability theory, I might be biased. [Incidentally, there is a small
statistical joke in that last sentence.]

Getting back to Meino's original request, one of the optimum
compression algorithms for this would be custom Huffman encoding.  To
do this the algorithm requires that all the data (i.e. digits) be read
and a frequency table built.  The only problem is that to read all the
digits of Pi could take rather a long time. ... :-)
- -- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwnoon@ntlworld.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlRVVyQACgkQRQ2Fs59Psv/9qwCeKwuLz/7RGEV06X+RdDQryDe+
/xwAoK1qMgb9RZXkQByBUMqB8eqs20bG
=XUPB
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-01 21:56       ` David W Noon
@ 2014-11-02 12:06         ` Matti Nykyri
  2014-11-03 15:48           ` Grant Edwards
  2014-11-02 19:55         ` Alan McKinnon
  1 sibling, 1 reply; 22+ messages in thread
From: Matti Nykyri @ 2014-11-02 12:06 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

> On Nov 1, 2014, at 23:56, David W Noon <dwnoon@ntlworld.com> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon
> (alan.mckinnon@gmail.com) wrote about "Re: [gentoo-user] Re: OT Best
> way to compress files with digits" (in <545546D3.3030005@gmail.com>):
> 
>> On 01/11/2014 19:59, meino.cramer@gmx.de wrote:
> [snip]
>>> Ah! By the way...I was astonished to read, that the digits of PI
>>> are called random on the one hand and on the other hand there is
>>> a formula [1] to calculate a certain digit of PI without
>>> calculation of the previous digits... Calculated random? Are
>>> nature constants the purest form of PRNGs ??? ;) (Quantum physics
>>> is everywhere... ;;))
>>> 
>>> [1]:
>>> http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula
>> 
>> 
>> The sequence of digits that make up pi are a random sequence - you
>> can analyze the order any way you want and you'll find no inherent
>> pattern.
> 
> Actually, the sequence of digits is most definitely *not* random.  If
> the sequence of digits is written any other way then the value is not
> Pi.  Hence the sequence is unique, not random.
> 
> I think what you are grasping for is that the frequency of distinct
> digits tends to be uniform: 0's occur as often as 1's as often ... as
> 9's.  Note that the "as often as" operator is really approximate for
> finite sub-sequences, but is asymptotically accurate.
> 
> Moreover, this is the same in any number base: the binary
> representation has 0's occurring as often as 1's; the ternary
> representation has 0's occurring as often as 1' and as often as 2's;
> etc., etc.
> 
> Such numbers are called "normal".  It was a poor choice of name, but
> we are stuck with it.  I would have called them "digit soup" numbers
> - -- an oblique reference to alphabet soup.

Well all the digit of pi can be compressed to the following:

=pi();

If you have the infinite series that calculates the digits :)

>> However, any given digit in the sequence is 100% predictable, as
>> you just showed :-)
>> 
>> Randomness has got to be the second most mind-boggling thing out
>> there, first being quantumness (that's not a waord, I just made it
>> up. You you should get the meaning OK from context ;-) )
> 
> I would say that probability theory is more mind boggling, as it
> underpins much of quantum theory.  But, as someone who majored in
> probability theory, I might be biased. [Incidentally, there is a small
> statistical joke in that last sentence.]
> 
> Getting back to Meino's original request, one of the optimum
> compression algorithms for this would be custom Huffman encoding.  To
> do this the algorithm requires that all the data (i.e. digits) be read
> and a frequency table built.  The only problem is that to read all the
> digits of Pi could take rather a long time. ... :-)

That would take infinite time :)

> - -- 
> Regards,
> 
> Dave  [RLU #314465]
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> dwnoon@ntlworld.com (David W Noon)
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iEYEARECAAYFAlRVVyQACgkQRQ2Fs59Psv/9qwCeKwuLz/7RGEV06X+RdDQryDe+
> /xwAoK1qMgb9RZXkQByBUMqB8eqs20bG
> =XUPB
> -----END PGP SIGNATURE-----
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-01 21:56       ` David W Noon
  2014-11-02 12:06         ` Matti Nykyri
@ 2014-11-02 19:55         ` Alan McKinnon
  2014-11-02 22:03           ` Peter Humphrey
  1 sibling, 1 reply; 22+ messages in thread
From: Alan McKinnon @ 2014-11-02 19:55 UTC (permalink / raw
  To: gentoo-user

On 01/11/2014 23:56, David W Noon wrote:
>> The sequence of digits that make up pi are a random sequence - you
>> > can analyze the order any way you want and you'll find no inherent
>> > pattern.
> Actually, the sequence of digits is most definitely *not* random.  If
> the sequence of digits is written any other way then the value is not
> Pi.  Hence the sequence is unique, not random.
> 
> I think what you are grasping for is that the frequency of distinct
> digits tends to be uniform: 0's occur as often as 1's as often ... as
> 9's.  Note that the "as often as" operator is really approximate for
> finite sub-sequences, but is asymptotically accurate.
> 
> Moreover, this is the same in any number base: the binary
> representation has 0's occurring as often as 1's; the ternary
> representation has 0's occurring as often as 1' and as often as 2's;
> etc., etc.
> 
> Such numbers are called "normal".  It was a poor choice of name, but
> we are stuck with it.  I would have called them "digit soup" numbers
> -- an oblique reference to alphabet soup.
> 

You grasp correctly what I was saying :-)

I'm not formally trained in mathematics so I often get the terminology
wrong or just don't know the accepted words for a concept. Lucky for me
though, English is a heavily overloaded language and there's always more
than one way to communicate something

-- 
Alan McKinnon
alan.mckinnon@gmail.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-02 19:55         ` Alan McKinnon
@ 2014-11-02 22:03           ` Peter Humphrey
  2014-11-03 19:37             ` Mick
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Humphrey @ 2014-11-02 22:03 UTC (permalink / raw
  To: gentoo-user

On Sunday 02 November 2014 21:55:31 Alan McKinnon wrote:

> English is a heavily overloaded language and there's always more
> than one way to communicate something

Even the simplest cases usually have three words for the same thing: one from 
French, one from Latin and one from Anglo-Saxon. I won't even mention words 
that have come down from Old German and so on, but at least we don't have 
many words from Italian or Spanish. (Zucchini? What's that?)

-- 
Rgds
Peter



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-02 12:06         ` Matti Nykyri
@ 2014-11-03 15:48           ` Grant Edwards
  0 siblings, 0 replies; 22+ messages in thread
From: Grant Edwards @ 2014-11-03 15:48 UTC (permalink / raw
  To: gentoo-user

On 2014-11-02, Matti Nykyri <matti.nykyri@iki.fi> wrote:
>> On Nov 1, 2014, at 23:56, David W Noon <dwnoon@ntlworld.com> wrote:
>> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon
>> (alan.mckinnon@gmail.com) wrote about "Re: [gentoo-user] Re: OT Best
>> way to compress files with digits" (in <545546D3.3030005@gmail.com>):
>> 
>>> On 01/11/2014 19:59, meino.cramer@gmx.de wrote:
>> [snip]
>>>> Ah! By the way...I was astonished to read, that the digits of PI
>>>> are called random on the one hand and on the other hand there is
>>>> a formula [1] to calculate a certain digit of PI without
>>>> calculation of the previous digits... Calculated random? Are
>>>> nature constants the purest form of PRNGs ??? ;) (Quantum physics
>>>> is everywhere... ;;))
>>>> 
>>>> [1]:
>>>> http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula
>>> 
>>> 
>>> The sequence of digits that make up pi are a random sequence - you
>>> can analyze the order any way you want and you'll find no inherent
>>> pattern.
>> 
>> Actually, the sequence of digits is most definitely *not* random.  If
>> the sequence of digits is written any other way then the value is not
>> Pi.  Hence the sequence is unique, not random.
>> 
>> I think what you are grasping for is that the frequency of distinct
>> digits tends to be uniform: 0's occur as often as 1's as often ... as
>> 9's.  Note that the "as often as" operator is really approximate for

> Well all the digit of pi can be compressed to the following:
>
>=pi();

Nah.  Just switch to base-Pi, and then it compresses to:

1

-- 
Grant Edwards               grant.b.edwards        Yow! Are we THERE yet?
                                  at               
                              gmail.com            



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-02 22:03           ` Peter Humphrey
@ 2014-11-03 19:37             ` Mick
  2014-11-04  2:04               ` Peter Humphrey
  0 siblings, 1 reply; 22+ messages in thread
From: Mick @ 2014-11-03 19:37 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: Text/Plain, Size: 582 bytes --]

On Sunday 02 Nov 2014 22:03:13 Peter Humphrey wrote:
> On Sunday 02 November 2014 21:55:31 Alan McKinnon wrote:
> > English is a heavily overloaded language and there's always more
> > than one way to communicate something
> 
> Even the simplest cases usually have three words for the same thing: one
> from French, one from Latin and one from Anglo-Saxon. I won't even mention
> words that have come down from Old German and so on, but at least we don't
> have many words from Italian or Spanish. (Zucchini? What's that?)

That's clearly baloney!

-- 
Regards,
Mick

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-03 19:37             ` Mick
@ 2014-11-04  2:04               ` Peter Humphrey
  2014-11-04  6:35                 ` Mick
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Humphrey @ 2014-11-04  2:04 UTC (permalink / raw
  To: gentoo-user

On Monday 03 November 2014 19:37:52 Mick wrote:

> > Even the simplest cases usually have three words for the same thing: one
> > from French, one from Latin and one from Anglo-Saxon. I won't even mention
> > words that have come down from Old German and so on, but at least we 
don't
> > have many words from Italian or Spanish. (Zucchini? What's that?)
> 
> That's clearly baloney!

Explain.

-- 
Rgds
Peter



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [gentoo-user] Re: OT Best way to compress files with digits
  2014-11-04  2:04               ` Peter Humphrey
@ 2014-11-04  6:35                 ` Mick
  0 siblings, 0 replies; 22+ messages in thread
From: Mick @ 2014-11-04  6:35 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: Text/Plain, Size: 571 bytes --]

On Tuesday 04 Nov 2014 02:04:45 Peter Humphrey wrote:
> On Monday 03 November 2014 19:37:52 Mick wrote:
> > > Even the simplest cases usually have three words for the same thing:
> > > one from French, one from Latin and one from Anglo-Saxon. I won't even
> > > mention words that have come down from Old German and so on, but at
> > > least we
> 
> don't
> 
> > > have many words from Italian or Spanish. (Zucchini? What's that?)
> > 
> > That's clearly baloney!
> 
> Explain.

http://en.wikipedia.org/wiki/Bologna_sausage

 :-)

-- 
Regards,
Mick

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-11-04  6:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-31 15:36 [gentoo-user] OT Best way to compress files with digits meino.cramer
2014-10-31 15:45 ` Ralf
2014-10-31 15:59   ` meino.cramer
2014-10-31 16:52     ` Helmut Jarausch
2014-10-31 17:56     ` Rich Freeman
2014-10-31 18:55       ` David Haller
2014-10-31 19:23         ` Rich Freeman
2014-10-31 20:25           ` [gentoo-user] " Grant Edwards
2014-10-31 22:22             ` Rich Freeman
2014-11-01 17:15 ` James
2014-11-01 17:26   ` Alan McKinnon
2014-11-01 20:18     ` Matti Nykyri
2014-11-01 17:59   ` meino.cramer
2014-11-01 20:47     ` Alan McKinnon
2014-11-01 21:56       ` David W Noon
2014-11-02 12:06         ` Matti Nykyri
2014-11-03 15:48           ` Grant Edwards
2014-11-02 19:55         ` Alan McKinnon
2014-11-02 22:03           ` Peter Humphrey
2014-11-03 19:37             ` Mick
2014-11-04  2:04               ` Peter Humphrey
2014-11-04  6:35                 ` Mick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox