* [gentoo-user] OT Best way to compress files with digits @ 2014-10-31 15:36 meino.cramer 2014-10-31 15:45 ` Ralf 2014-11-01 17:15 ` James 0 siblings, 2 replies; 22+ messages in thread From: meino.cramer @ 2014-10-31 15:36 UTC (permalink / raw To: Gentoo Hi, I have a lot of files with digits of PI. The digits are the characters of 0-9. Currently they are ZIPped, which I think is not the best way to do that. I read of 7zips PPMd which compresses "natural text" quite well...but my files are not "natural text" (as they are also no "binary data"). With what practical way of compression is it possible to compress the files (file by file) as much as possible? Thank you very much in advance for any help! Best regards, mcc ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] OT Best way to compress files with digits 2014-10-31 15:36 [gentoo-user] OT Best way to compress files with digits meino.cramer @ 2014-10-31 15:45 ` Ralf 2014-10-31 15:59 ` meino.cramer 2014-11-01 17:15 ` James 1 sibling, 1 reply; 22+ messages in thread From: Ralf @ 2014-10-31 15:45 UTC (permalink / raw To: gentoo-user Well, you could just save the generating algorithm. *scnr* I think compressing pi is hardly possible, as the numbers are distributed pretty randomly. But why do you want to compress? You can't work on compressed data. And there are enough sites on the internet, where you can get your digits again. Pi is not supposed to change over the years :-) Cheers Ralf On 31.10.2014 17:36, meino.cramer@gmx.de wrote: > Hi, > > I have a lot of files with digits of PI. The digits > are the characters of 0-9. Currently they are ZIPped, > which I think is not the best way to do that. > > I read of 7zips PPMd which compresses "natural text" > quite well...but my files are not "natural text" (as > they are also no "binary data"). > > With what practical way of compression is it possible > to compress the files (file by file) as much as possible? > > Thank you very much in advance for any help! > > Best regards, > mcc > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] OT Best way to compress files with digits 2014-10-31 15:45 ` Ralf @ 2014-10-31 15:59 ` meino.cramer 2014-10-31 16:52 ` Helmut Jarausch 2014-10-31 17:56 ` Rich Freeman 0 siblings, 2 replies; 22+ messages in thread From: meino.cramer @ 2014-10-31 15:59 UTC (permalink / raw To: gentoo-user Ralf <ralf+gentoo@ramses-pyramidenbau.de> [14-10-31 16:48]: > Well, you could just save the generating algorithm. *scnr* > > I think compressing pi is hardly possible, as the numbers are > distributed pretty randomly. > But why do you want to compress? You can't work on compressed data. > And there are enough sites on the internet, where you can get your > digits again. > > Pi is not supposed to change over the years :-) > > Cheers > Ralf > > On 31.10.2014 17:36, meino.cramer@gmx.de wrote: > > Hi, > > > > I have a lot of files with digits of PI. The digits > > are the characters of 0-9. Currently they are ZIPped, > > which I think is not the best way to do that. > > > > I read of 7zips PPMd which compresses "natural text" > > quite well...but my files are not "natural text" (as > > they are also no "binary data"). > > > > With what practical way of compression is it possible > > to compress the files (file by file) as much as possible? > > > > Thank you very much in advance for any help! > > > > Best regards, > > mcc > > > > > > > > Hi Ralf, I have a damn slow Internet connection and searching through millions of digits is not always provided. Despite that: I want to do more with that digits, I have to download them again and again. Its better to get a copy of the 2014th version of PI for later reference local on my hd. I am currently checking the compression tools I know of for the best compression ration. But I will definitly miss those I dont know... And sometimes one can do magic with option and switches of that kind of tools I also dont know of. If someone has suggestions....always appreciated! :) Best regards, mcc ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] OT Best way to compress files with digits 2014-10-31 15:59 ` meino.cramer @ 2014-10-31 16:52 ` Helmut Jarausch 2014-10-31 17:56 ` Rich Freeman 1 sibling, 0 replies; 22+ messages in thread From: Helmut Jarausch @ 2014-10-31 16:52 UTC (permalink / raw To: gentoo-user On 10/31/2014 04:59:17 PM, meino.cramer@gmx.de wrote: > If someone has suggestions....always appreciated! :) It's best to ask on the news group comp.compression. There are top international specialists. Helmut ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] OT Best way to compress files with digits 2014-10-31 15:59 ` meino.cramer 2014-10-31 16:52 ` Helmut Jarausch @ 2014-10-31 17:56 ` Rich Freeman 2014-10-31 18:55 ` David Haller 1 sibling, 1 reply; 22+ messages in thread From: Rich Freeman @ 2014-10-31 17:56 UTC (permalink / raw To: gentoo-user On Fri, Oct 31, 2014 at 11:59 AM, <meino.cramer@gmx.de> wrote: > I am currently checking the compression tools I know of for the > best compression ration. But I will definitly miss those I dont > know... > And sometimes one can do magic with option and switches of that > kind of tools I also dont know of. I can't imagine that any tool will do much better than something like lzo, gzip, xz, etc. You'll definitely benefit from compression though - your text files full of digits are encoding 3.3 bits of information in an 8-bit ascii character and even if the order of digits in pi can be treated as purely random just about any compression algorithm is going to get pretty close to that 3.3 bits per digit figure. -- Rich ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] OT Best way to compress files with digits 2014-10-31 17:56 ` Rich Freeman @ 2014-10-31 18:55 ` David Haller 2014-10-31 19:23 ` Rich Freeman 0 siblings, 1 reply; 22+ messages in thread From: David Haller @ 2014-10-31 18:55 UTC (permalink / raw To: gentoo-user Hello, On Fri, 31 Oct 2014, Rich Freeman wrote: >On Fri, Oct 31, 2014 at 11:59 AM, <meino.cramer@gmx.de> wrote: >> I am currently checking the compression tools I know of for the >> best compression ration. But I will definitly miss those I dont >> know... >> And sometimes one can do magic with option and switches of that >> kind of tools I also dont know of. With 100k pseudo-random digits from bash's $RANDOM % 10 and a linebreak every 100 digits (in t.lst) I get this (each with --best / -9 / -m5 (rar) compression-level option): $ du -b * | sort -rn 101000 t.lst 61544 t.lzop 50733 t.zoo 49696 t.zip 49609 t.lha 49554 t.gz 48907 t.Z 44942 t.rar 44661 t.rzip 44638 t.7z 44592 t.xz 44572 t.bz2 44546 t.lzma 44543 t.lzip What I find remarkable is that both gzip and good old compress (.Z) are rather good ;) And above is probably a quite comprehensible list, and except .Z, .gz and .bz2 all are name as the binaries used to create them. I'd use bzip2/xz/lz as there are e.g. [blx]z(e)(grep|cat|less), but not e.g. 7zgrep, and I guess they can easy access to those archives quite a bit. >I can't imagine that any tool will do much better than something like >lzo, gzip, xz, etc. You'll definitely benefit from compression though >- your text files full of digits are encoding 3.3 bits of information >in an 8-bit ascii character and even if the order of digits in pi can >be treated as purely random just about any compression algorithm is >going to get pretty close to that 3.3 bits per digit figure. Good estimate: $ calc '101000/(8/3.3)' 41662.5 and I get from (lzip) $ calc 44543*8/101000 3.528... (bits/digit) to zip: $ calc 49696*8/101000 ~3.93 (bits/digit) HTH, -dnh -- Q: Hobbies? A: Hating music. -- Marvin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] OT Best way to compress files with digits 2014-10-31 18:55 ` David Haller @ 2014-10-31 19:23 ` Rich Freeman 2014-10-31 20:25 ` [gentoo-user] " Grant Edwards 0 siblings, 1 reply; 22+ messages in thread From: Rich Freeman @ 2014-10-31 19:23 UTC (permalink / raw To: gentoo-user On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@dhaller.de> wrote: > > On Fri, 31 Oct 2014, Rich Freeman wrote: > >>I can't imagine that any tool will do much better than something like >>lzo, gzip, xz, etc. You'll definitely benefit from compression though >>- your text files full of digits are encoding 3.3 bits of information >>in an 8-bit ascii character and even if the order of digits in pi can >>be treated as purely random just about any compression algorithm is >>going to get pretty close to that 3.3 bits per digit figure. > > Good estimate: > > $ calc '101000/(8/3.3)' > 41662.5 > and I get from (lzip) > $ calc 44543*8/101000 > 3.528... (bits/digit) > to zip: > $ calc 49696*8/101000 > ~3.93 (bits/digit) Actually, I'm surprised how far off of this the various methods are. I was expecting SOME overhead, but not this much. A fairly quick algorithm would be to encode every possible set of 96 digits into a 40 byte code (that is just a straight decimal-binary conversion). Then read a "word" at a time and translate it. This will only waste 0.011 bits per digit. -- Rich ^ permalink raw reply [flat|nested] 22+ messages in thread
* [gentoo-user] Re: OT Best way to compress files with digits 2014-10-31 19:23 ` Rich Freeman @ 2014-10-31 20:25 ` Grant Edwards 2014-10-31 22:22 ` Rich Freeman 0 siblings, 1 reply; 22+ messages in thread From: Grant Edwards @ 2014-10-31 20:25 UTC (permalink / raw To: gentoo-user On 2014-10-31, Rich Freeman <rich0@gentoo.org> wrote: > On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gentoo@dhaller.de> wrote: >> >> On Fri, 31 Oct 2014, Rich Freeman wrote: >> >>>I can't imagine that any tool will do much better than something like >>>lzo, gzip, xz, etc. You'll definitely benefit from compression though >>>- your text files full of digits are encoding 3.3 bits of information >>>in an 8-bit ascii character and even if the order of digits in pi can >>>be treated as purely random just about any compression algorithm is >>>going to get pretty close to that 3.3 bits per digit figure. >> >> Good estimate: >> >> $ calc '101000/(8/3.3)' >> 41662.5 >> and I get from (lzip) >> $ calc 44543*8/101000 >> 3.528... (bits/digit) >> to zip: >> $ calc 49696*8/101000 >> ~3.93 (bits/digit) > > Actually, I'm surprised how far off of this the various methods are. > I was expecting SOME overhead, but not this much. > > A fairly quick algorithm would be to encode every possible set of 96 > digits into a 40 byte code (that is just a straight decimal-binary > conversion). Then read a "word" at a time and translate it. This > will only waste 0.011 bits per digit. You're cheating. The algorithm you tested will compress strings of arbitrary 8-bit values. The algorithm you proposed will only compress strings of bytes where each byte can have only one of 10 values. -- Grant Edwards grant.b.edwards Yow! I want another at RE-WRITE on my CEASAR gmail.com SALAD!! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-10-31 20:25 ` [gentoo-user] " Grant Edwards @ 2014-10-31 22:22 ` Rich Freeman 0 siblings, 0 replies; 22+ messages in thread From: Rich Freeman @ 2014-10-31 22:22 UTC (permalink / raw To: gentoo-user On Fri, Oct 31, 2014 at 4:25 PM, Grant Edwards <grant.b.edwards@gmail.com> wrote: > > You're cheating. The algorithm you tested will compress strings of > arbitrary 8-bit values. The algorithm you proposed will only compress > strings of bytes where each byte can have only one of 10 values. > Of course. I wasn't expecting the general-purpose algorithm to do as well. In some sense, part of the information that is being encoded is actually in the compression algorithm itself (the mapping), while in a general-purpose compression algorithm that information has to be part of the compressed data stream. I was just expecting gzip/etc to get much closer to the theoretical limit. I figured that it might be a few percent higher, but I wasn't expecting a 10+% difference. -- Rich ^ permalink raw reply [flat|nested] 22+ messages in thread
* [gentoo-user] Re: OT Best way to compress files with digits 2014-10-31 15:36 [gentoo-user] OT Best way to compress files with digits meino.cramer 2014-10-31 15:45 ` Ralf @ 2014-11-01 17:15 ` James 2014-11-01 17:26 ` Alan McKinnon 2014-11-01 17:59 ` meino.cramer 1 sibling, 2 replies; 22+ messages in thread From: James @ 2014-11-01 17:15 UTC (permalink / raw To: gentoo-user <meino.cramer <at> gmx.de> writes: > I have a lot of files with digits of PI. The digits > are the characters of 0-9. Currently they are ZIPped, > which I think is not the best way to do that. Hello Meino, It's a bit of effort, but the world's recognized authority on algorithms is Don Knuth. [1] He's old now, but his pioneering attempt at categorizing most algorithms: "The art of computer programming" and his MMIX alogrithm implementations (kinda like assembler) are certainly part of many first-step research efforts on algorithms and their implementations. It's not a cookbook; more of a scholarly (high_brow) reference, just to supplement all the good postings by your peers on gentoo user. Alan may loan you his copy? (ha ha ha)? hth, James [1] http://www-cs-faculty.stanford.edu/~uno/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-01 17:15 ` James @ 2014-11-01 17:26 ` Alan McKinnon 2014-11-01 20:18 ` Matti Nykyri 2014-11-01 17:59 ` meino.cramer 1 sibling, 1 reply; 22+ messages in thread From: Alan McKinnon @ 2014-11-01 17:26 UTC (permalink / raw To: gentoo-user On 01/11/2014 19:15, James wrote: > <meino.cramer <at> gmx.de> writes: > > >> I have a lot of files with digits of PI. The digits >> are the characters of 0-9. Currently they are ZIPped, >> which I think is not the best way to do that. > > Hello Meino, > > It's a bit of effort, but the world's recognized authority > on algorithms is Don Knuth. [1] He's old now, but his > pioneering attempt at categorizing most algorithms: > "The art of computer programming" and his MMIX alogrithm > implementations (kinda like assembler) are certainly > part of many first-step research efforts on algorithms > and their implementations. > > It's not a cookbook; more of a scholarly (high_brow) reference, > just to supplement all the good postings by your peers on gentoo user. > > Alan may loan you his copy? > (ha ha ha)? > > > > hth, > James > > [1] http://www-cs-faculty.stanford.edu/~uno/ ha ha, fat chance :-) When Alan does eventually get his hands on his very own personal copy[1], it will be lent to nobody. There are just some things a man never lends out: his bike, his firearm, his wife. And Knuth :-) Back on topic: You're 100% right - to learn about algorithms in general, Knuth is the man. Essential reading for anyone taking CS seriously -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-01 17:26 ` Alan McKinnon @ 2014-11-01 20:18 ` Matti Nykyri 0 siblings, 0 replies; 22+ messages in thread From: Matti Nykyri @ 2014-11-01 20:18 UTC (permalink / raw To: gentoo-user@lists.gentoo.org > On Nov 1, 2014, at 19:26, Alan McKinnon <alan.mckinnon@gmail.com> wrote: > >> On 01/11/2014 19:15, James wrote: >> <meino.cramer <at> gmx.de> writes: >> >> >>> I have a lot of files with digits of PI. The digits >>> are the characters of 0-9. Currently they are ZIPped, >>> which I think is not the best way to do that. >> >> Hello Meino, >> >> It's a bit of effort, but the world's recognized authority >> on algorithms is Don Knuth. [1] He's old now, but his >> pioneering attempt at categorizing most algorithms: >> "The art of computer programming" and his MMIX alogrithm >> implementations (kinda like assembler) are certainly >> part of many first-step research efforts on algorithms >> and their implementations. >> >> It's not a cookbook; more of a scholarly (high_brow) reference, >> just to supplement all the good postings by your peers on gentoo user. >> >> Alan may loan you his copy? >> (ha ha ha)? >> >> >> >> hth, >> James >> >> [1] http://www-cs-faculty.stanford.edu/~uno/ > > > ha ha, fat chance :-) > > When Alan does eventually get his hands on his very own personal > copy[1], it will be lent to nobody. There are just some things a man > never lends out: his bike, his firearm, his wife. And Knuth :-) Why not lend your wife? ;) > Back on topic: You're 100% right - to learn about algorithms in general, > Knuth is the man. Essential reading for anyone taking CS seriously > > -- > Alan McKinnon > alan.mckinnon@gmail.com > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-01 17:15 ` James 2014-11-01 17:26 ` Alan McKinnon @ 2014-11-01 17:59 ` meino.cramer 2014-11-01 20:47 ` Alan McKinnon 1 sibling, 1 reply; 22+ messages in thread From: meino.cramer @ 2014-11-01 17:59 UTC (permalink / raw To: gentoo-user James <wireless@tampabay.rr.com> [14-11-01 18:16]: > <meino.cramer <at> gmx.de> writes: > > > > I have a lot of files with digits of PI. The digits > > are the characters of 0-9. Currently they are ZIPped, > > which I think is not the best way to do that. > > Hello Meino, > > It's a bit of effort, but the world's recognized authority > on algorithms is Don Knuth. [1] He's old now, but his > pioneering attempt at categorizing most algorithms: > "The art of computer programming" and his MMIX alogrithm > implementations (kinda like assembler) are certainly > part of many first-step research efforts on algorithms > and their implementations. > > It's not a cookbook; more of a scholarly (high_brow) reference, > just to supplement all the good postings by your peers on gentoo user. > > Alan may loan you his copy? > (ha ha ha)? > > > > hth, > James > > [1] http://www-cs-faculty.stanford.edu/~uno/ > Hello james, Don Knuth ... oh YES! :) For a long time I am using and prefering TeX over anything else (ok...for ASCII I use vim... ;). And beside his computer wisdom I also like his kind of humor a lot... for example this one: https://www.youtube.com/watch?v=eKaI78K_rgA&list=PLUu0XRts4lK6Ri7-xaCNYqTHx7We95Rk8&index=10 But my initial question was more targeted to "practical computing" as to groundshakeing and fundamental research topics. More like "what tool to pick?"... I did some compression tests myself and currently I have this: From http://piworld.calico.jp/ (http://piworld.calico.jp/estart.html) I got zipped package of 1000 million places of PI each (~57MB for one ZIP). I unpacked the first package and recompressed it with different methods of 7zip, gzip and bzip2. For gzip and bzip2 I used the highest compression mode (-9). When a files name matches /.*ultra.*/, I used the highest compression mode (-mx=9), else I only set the compression method and leave the rest untouched (defaults). 119888896 2014-10-31 16:44 pi-0001.txt 57105419 2014-10-31 16:47 pi-0001.txt.gz 52632832 2014-10-31 16:48 pi-0001.txt.bz2 52045827 2014-10-31 16:54 pi-0001.txt.ppmd.7z 57110291 2014-10-31 17:23 pi-0001.zip 51766683 2014-10-31 17:26 pi-0001.txt.lzma.7z 51668838 2014-10-31 17:34 pi-0001.txt.lzma.ultra.7z 52862115 2014-10-31 17:36 pi-0001.txt.ppmd.ultra.7z 51668838 2014-10-31 17:39 pi-0001.txt.ultra.7z 7zip's lzma wins here, which is also the default method of 7zip. I set the ultra mode for this by hand. From other sites which offer PI for download I know of methods, which store the ASCII-digits in binary and compresses then. Would be interesting, whether this creates a more "handy" input from 7zips point of view... Ah! By the way...I was astonished to read, that the digits of PI are called random on the one hand and on the other hand there is a formula [1] to calculate a certain digit of PI without calculation of the previous digits... Calculated random? Are nature constants the purest form of PRNGs ??? ;) (Quantum physics is everywhere... ;;)) [1]: http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula Best regards, Meino ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-01 17:59 ` meino.cramer @ 2014-11-01 20:47 ` Alan McKinnon 2014-11-01 21:56 ` David W Noon 0 siblings, 1 reply; 22+ messages in thread From: Alan McKinnon @ 2014-11-01 20:47 UTC (permalink / raw To: gentoo-user On 01/11/2014 19:59, meino.cramer@gmx.de wrote: > James <wireless@tampabay.rr.com> [14-11-01 18:16]: >> <meino.cramer <at> gmx.de> writes: >> >> >>> I have a lot of files with digits of PI. The digits >>> are the characters of 0-9. Currently they are ZIPped, >>> which I think is not the best way to do that. >> >> Hello Meino, >> >> It's a bit of effort, but the world's recognized authority >> on algorithms is Don Knuth. [1] He's old now, but his >> pioneering attempt at categorizing most algorithms: >> "The art of computer programming" and his MMIX alogrithm >> implementations (kinda like assembler) are certainly >> part of many first-step research efforts on algorithms >> and their implementations. >> >> It's not a cookbook; more of a scholarly (high_brow) reference, >> just to supplement all the good postings by your peers on gentoo user. >> >> Alan may loan you his copy? >> (ha ha ha)? >> >> >> >> hth, >> James >> >> [1] http://www-cs-faculty.stanford.edu/~uno/ >> > > Hello james, > > Don Knuth ... oh YES! :) > For a long time I am using and prefering TeX over anything else > (ok...for ASCII I use vim... ;). > > And beside his computer wisdom I also like his kind of humor a lot... > for example this one: > https://www.youtube.com/watch?v=eKaI78K_rgA&list=PLUu0XRts4lK6Ri7-xaCNYqTHx7We95Rk8&index=10 > > But my initial question was more targeted to "practical computing" as > to groundshakeing and fundamental research topics. > > More like "what tool to pick?"... > > I did some compression tests myself and currently I have this: >>From http://piworld.calico.jp/ (http://piworld.calico.jp/estart.html) > I got zipped package of > 1000 million places of PI each (~57MB for one ZIP). > > I unpacked the first package and recompressed it with different > methods of 7zip, gzip and bzip2. For gzip and bzip2 I used the highest > compression mode (-9). When a files name matches /.*ultra.*/, I used > the highest compression mode (-mx=9), else I only set the compression > method and leave the rest untouched (defaults). > > > 119888896 2014-10-31 16:44 pi-0001.txt > 57105419 2014-10-31 16:47 pi-0001.txt.gz > 52632832 2014-10-31 16:48 pi-0001.txt.bz2 > 52045827 2014-10-31 16:54 pi-0001.txt.ppmd.7z > 57110291 2014-10-31 17:23 pi-0001.zip > 51766683 2014-10-31 17:26 pi-0001.txt.lzma.7z > 51668838 2014-10-31 17:34 pi-0001.txt.lzma.ultra.7z > 52862115 2014-10-31 17:36 pi-0001.txt.ppmd.ultra.7z > 51668838 2014-10-31 17:39 pi-0001.txt.ultra.7z > > 7zip's lzma wins here, which is also the default method of 7zip. I set > the ultra mode for this by hand. > >>From other sites which offer PI for download I know of methods, which > store the ASCII-digits in binary and compresses then. Would be > interesting, whether this creates a more "handy" input from 7zips > point of view... > > Ah! By the way...I was astonished to read, that the digits of PI are > called random on the one hand and on the other hand there is a formula [1] > to calculate a certain digit of PI without calculation of the previous > digits... > Calculated random? Are nature constants the purest form of PRNGs ??? ;) > (Quantum physics is everywhere... ;;)) > > [1]: http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula The sequence of digits that make up pi are a random sequence - you can analyze the order any way you want and you'll find no inherent pattern. However, any given digit in the sequence is 100% predictable, as you just showed :-) Randomness has got to be the second most mind-boggling thing out there, first being quantumness (that's not a waord, I just made it up. You you should get the meaning OK from context ;-) ) -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* [gentoo-user] Re: OT Best way to compress files with digits 2014-11-01 20:47 ` Alan McKinnon @ 2014-11-01 21:56 ` David W Noon 2014-11-02 12:06 ` Matti Nykyri 2014-11-02 19:55 ` Alan McKinnon 0 siblings, 2 replies; 22+ messages in thread From: David W Noon @ 2014-11-01 21:56 UTC (permalink / raw To: gentoo-user -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon (alan.mckinnon@gmail.com) wrote about "Re: [gentoo-user] Re: OT Best way to compress files with digits" (in <545546D3.3030005@gmail.com>): > On 01/11/2014 19:59, meino.cramer@gmx.de wrote: [snip] >> Ah! By the way...I was astonished to read, that the digits of PI >> are called random on the one hand and on the other hand there is >> a formula [1] to calculate a certain digit of PI without >> calculation of the previous digits... Calculated random? Are >> nature constants the purest form of PRNGs ??? ;) (Quantum physics >> is everywhere... ;;)) >> >> [1]: >> http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula > >> > > The sequence of digits that make up pi are a random sequence - you > can analyze the order any way you want and you'll find no inherent > pattern. Actually, the sequence of digits is most definitely *not* random. If the sequence of digits is written any other way then the value is not Pi. Hence the sequence is unique, not random. I think what you are grasping for is that the frequency of distinct digits tends to be uniform: 0's occur as often as 1's as often ... as 9's. Note that the "as often as" operator is really approximate for finite sub-sequences, but is asymptotically accurate. Moreover, this is the same in any number base: the binary representation has 0's occurring as often as 1's; the ternary representation has 0's occurring as often as 1' and as often as 2's; etc., etc. Such numbers are called "normal". It was a poor choice of name, but we are stuck with it. I would have called them "digit soup" numbers - -- an oblique reference to alphabet soup. > However, any given digit in the sequence is 100% predictable, as > you just showed :-) > > Randomness has got to be the second most mind-boggling thing out > there, first being quantumness (that's not a waord, I just made it > up. You you should get the meaning OK from context ;-) ) I would say that probability theory is more mind boggling, as it underpins much of quantum theory. But, as someone who majored in probability theory, I might be biased. [Incidentally, there is a small statistical joke in that last sentence.] Getting back to Meino's original request, one of the optimum compression algorithms for this would be custom Huffman encoding. To do this the algorithm requires that all the data (i.e. digits) be read and a frequency table built. The only problem is that to read all the digits of Pi could take rather a long time. ... :-) - -- Regards, Dave [RLU #314465] *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* dwnoon@ntlworld.com (David W Noon) *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlRVVyQACgkQRQ2Fs59Psv/9qwCeKwuLz/7RGEV06X+RdDQryDe+ /xwAoK1qMgb9RZXkQByBUMqB8eqs20bG =XUPB -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-01 21:56 ` David W Noon @ 2014-11-02 12:06 ` Matti Nykyri 2014-11-03 15:48 ` Grant Edwards 2014-11-02 19:55 ` Alan McKinnon 1 sibling, 1 reply; 22+ messages in thread From: Matti Nykyri @ 2014-11-02 12:06 UTC (permalink / raw To: gentoo-user@lists.gentoo.org > On Nov 1, 2014, at 23:56, David W Noon <dwnoon@ntlworld.com> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon > (alan.mckinnon@gmail.com) wrote about "Re: [gentoo-user] Re: OT Best > way to compress files with digits" (in <545546D3.3030005@gmail.com>): > >> On 01/11/2014 19:59, meino.cramer@gmx.de wrote: > [snip] >>> Ah! By the way...I was astonished to read, that the digits of PI >>> are called random on the one hand and on the other hand there is >>> a formula [1] to calculate a certain digit of PI without >>> calculation of the previous digits... Calculated random? Are >>> nature constants the purest form of PRNGs ??? ;) (Quantum physics >>> is everywhere... ;;)) >>> >>> [1]: >>> http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula >> >> >> The sequence of digits that make up pi are a random sequence - you >> can analyze the order any way you want and you'll find no inherent >> pattern. > > Actually, the sequence of digits is most definitely *not* random. If > the sequence of digits is written any other way then the value is not > Pi. Hence the sequence is unique, not random. > > I think what you are grasping for is that the frequency of distinct > digits tends to be uniform: 0's occur as often as 1's as often ... as > 9's. Note that the "as often as" operator is really approximate for > finite sub-sequences, but is asymptotically accurate. > > Moreover, this is the same in any number base: the binary > representation has 0's occurring as often as 1's; the ternary > representation has 0's occurring as often as 1' and as often as 2's; > etc., etc. > > Such numbers are called "normal". It was a poor choice of name, but > we are stuck with it. I would have called them "digit soup" numbers > - -- an oblique reference to alphabet soup. Well all the digit of pi can be compressed to the following: =pi(); If you have the infinite series that calculates the digits :) >> However, any given digit in the sequence is 100% predictable, as >> you just showed :-) >> >> Randomness has got to be the second most mind-boggling thing out >> there, first being quantumness (that's not a waord, I just made it >> up. You you should get the meaning OK from context ;-) ) > > I would say that probability theory is more mind boggling, as it > underpins much of quantum theory. But, as someone who majored in > probability theory, I might be biased. [Incidentally, there is a small > statistical joke in that last sentence.] > > Getting back to Meino's original request, one of the optimum > compression algorithms for this would be custom Huffman encoding. To > do this the algorithm requires that all the data (i.e. digits) be read > and a frequency table built. The only problem is that to read all the > digits of Pi could take rather a long time. ... :-) That would take infinite time :) > - -- > Regards, > > Dave [RLU #314465] > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > dwnoon@ntlworld.com (David W Noon) > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2 > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlRVVyQACgkQRQ2Fs59Psv/9qwCeKwuLz/7RGEV06X+RdDQryDe+ > /xwAoK1qMgb9RZXkQByBUMqB8eqs20bG > =XUPB > -----END PGP SIGNATURE----- > ^ permalink raw reply [flat|nested] 22+ messages in thread
* [gentoo-user] Re: OT Best way to compress files with digits 2014-11-02 12:06 ` Matti Nykyri @ 2014-11-03 15:48 ` Grant Edwards 0 siblings, 0 replies; 22+ messages in thread From: Grant Edwards @ 2014-11-03 15:48 UTC (permalink / raw To: gentoo-user On 2014-11-02, Matti Nykyri <matti.nykyri@iki.fi> wrote: >> On Nov 1, 2014, at 23:56, David W Noon <dwnoon@ntlworld.com> wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On Sat, 01 Nov 2014 22:47:15 +0200, Alan Mckinnon >> (alan.mckinnon@gmail.com) wrote about "Re: [gentoo-user] Re: OT Best >> way to compress files with digits" (in <545546D3.3030005@gmail.com>): >> >>> On 01/11/2014 19:59, meino.cramer@gmx.de wrote: >> [snip] >>>> Ah! By the way...I was astonished to read, that the digits of PI >>>> are called random on the one hand and on the other hand there is >>>> a formula [1] to calculate a certain digit of PI without >>>> calculation of the previous digits... Calculated random? Are >>>> nature constants the purest form of PRNGs ??? ;) (Quantum physics >>>> is everywhere... ;;)) >>>> >>>> [1]: >>>> http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula >>> >>> >>> The sequence of digits that make up pi are a random sequence - you >>> can analyze the order any way you want and you'll find no inherent >>> pattern. >> >> Actually, the sequence of digits is most definitely *not* random. If >> the sequence of digits is written any other way then the value is not >> Pi. Hence the sequence is unique, not random. >> >> I think what you are grasping for is that the frequency of distinct >> digits tends to be uniform: 0's occur as often as 1's as often ... as >> 9's. Note that the "as often as" operator is really approximate for > Well all the digit of pi can be compressed to the following: > >=pi(); Nah. Just switch to base-Pi, and then it compresses to: 1 -- Grant Edwards grant.b.edwards Yow! Are we THERE yet? at gmail.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-01 21:56 ` David W Noon 2014-11-02 12:06 ` Matti Nykyri @ 2014-11-02 19:55 ` Alan McKinnon 2014-11-02 22:03 ` Peter Humphrey 1 sibling, 1 reply; 22+ messages in thread From: Alan McKinnon @ 2014-11-02 19:55 UTC (permalink / raw To: gentoo-user On 01/11/2014 23:56, David W Noon wrote: >> The sequence of digits that make up pi are a random sequence - you >> > can analyze the order any way you want and you'll find no inherent >> > pattern. > Actually, the sequence of digits is most definitely *not* random. If > the sequence of digits is written any other way then the value is not > Pi. Hence the sequence is unique, not random. > > I think what you are grasping for is that the frequency of distinct > digits tends to be uniform: 0's occur as often as 1's as often ... as > 9's. Note that the "as often as" operator is really approximate for > finite sub-sequences, but is asymptotically accurate. > > Moreover, this is the same in any number base: the binary > representation has 0's occurring as often as 1's; the ternary > representation has 0's occurring as often as 1' and as often as 2's; > etc., etc. > > Such numbers are called "normal". It was a poor choice of name, but > we are stuck with it. I would have called them "digit soup" numbers > -- an oblique reference to alphabet soup. > You grasp correctly what I was saying :-) I'm not formally trained in mathematics so I often get the terminology wrong or just don't know the accepted words for a concept. Lucky for me though, English is a heavily overloaded language and there's always more than one way to communicate something -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-02 19:55 ` Alan McKinnon @ 2014-11-02 22:03 ` Peter Humphrey 2014-11-03 19:37 ` Mick 0 siblings, 1 reply; 22+ messages in thread From: Peter Humphrey @ 2014-11-02 22:03 UTC (permalink / raw To: gentoo-user On Sunday 02 November 2014 21:55:31 Alan McKinnon wrote: > English is a heavily overloaded language and there's always more > than one way to communicate something Even the simplest cases usually have three words for the same thing: one from French, one from Latin and one from Anglo-Saxon. I won't even mention words that have come down from Old German and so on, but at least we don't have many words from Italian or Spanish. (Zucchini? What's that?) -- Rgds Peter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-02 22:03 ` Peter Humphrey @ 2014-11-03 19:37 ` Mick 2014-11-04 2:04 ` Peter Humphrey 0 siblings, 1 reply; 22+ messages in thread From: Mick @ 2014-11-03 19:37 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: Text/Plain, Size: 582 bytes --] On Sunday 02 Nov 2014 22:03:13 Peter Humphrey wrote: > On Sunday 02 November 2014 21:55:31 Alan McKinnon wrote: > > English is a heavily overloaded language and there's always more > > than one way to communicate something > > Even the simplest cases usually have three words for the same thing: one > from French, one from Latin and one from Anglo-Saxon. I won't even mention > words that have come down from Old German and so on, but at least we don't > have many words from Italian or Spanish. (Zucchini? What's that?) That's clearly baloney! -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-03 19:37 ` Mick @ 2014-11-04 2:04 ` Peter Humphrey 2014-11-04 6:35 ` Mick 0 siblings, 1 reply; 22+ messages in thread From: Peter Humphrey @ 2014-11-04 2:04 UTC (permalink / raw To: gentoo-user On Monday 03 November 2014 19:37:52 Mick wrote: > > Even the simplest cases usually have three words for the same thing: one > > from French, one from Latin and one from Anglo-Saxon. I won't even mention > > words that have come down from Old German and so on, but at least we don't > > have many words from Italian or Spanish. (Zucchini? What's that?) > > That's clearly baloney! Explain. -- Rgds Peter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [gentoo-user] Re: OT Best way to compress files with digits 2014-11-04 2:04 ` Peter Humphrey @ 2014-11-04 6:35 ` Mick 0 siblings, 0 replies; 22+ messages in thread From: Mick @ 2014-11-04 6:35 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: Text/Plain, Size: 571 bytes --] On Tuesday 04 Nov 2014 02:04:45 Peter Humphrey wrote: > On Monday 03 November 2014 19:37:52 Mick wrote: > > > Even the simplest cases usually have three words for the same thing: > > > one from French, one from Latin and one from Anglo-Saxon. I won't even > > > mention words that have come down from Old German and so on, but at > > > least we > > don't > > > > have many words from Italian or Spanish. (Zucchini? What's that?) > > > > That's clearly baloney! > > Explain. http://en.wikipedia.org/wiki/Bologna_sausage :-) -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-11-04 6:36 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-31 15:36 [gentoo-user] OT Best way to compress files with digits meino.cramer 2014-10-31 15:45 ` Ralf 2014-10-31 15:59 ` meino.cramer 2014-10-31 16:52 ` Helmut Jarausch 2014-10-31 17:56 ` Rich Freeman 2014-10-31 18:55 ` David Haller 2014-10-31 19:23 ` Rich Freeman 2014-10-31 20:25 ` [gentoo-user] " Grant Edwards 2014-10-31 22:22 ` Rich Freeman 2014-11-01 17:15 ` James 2014-11-01 17:26 ` Alan McKinnon 2014-11-01 20:18 ` Matti Nykyri 2014-11-01 17:59 ` meino.cramer 2014-11-01 20:47 ` Alan McKinnon 2014-11-01 21:56 ` David W Noon 2014-11-02 12:06 ` Matti Nykyri 2014-11-03 15:48 ` Grant Edwards 2014-11-02 19:55 ` Alan McKinnon 2014-11-02 22:03 ` Peter Humphrey 2014-11-03 19:37 ` Mick 2014-11-04 2:04 ` Peter Humphrey 2014-11-04 6:35 ` Mick
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox