* [gentoo-user] OT: Mapping random numbers (PRNG) @ 2014-06-06 2:56 meino.cramer 2014-06-06 3:58 ` Canek Peláez Valdés 0 siblings, 1 reply; 26+ messages in thread From: meino.cramer @ 2014-06-06 2:56 UTC (permalink / raw To: Gentoo Hi, I am experimenting with the C code of the ISAAC pseudo random number generator (http://burtleburtle.net/bob/rand/isaacafa.html). Currently the implementation creates (on my embedded linux) 32 bit hexadecimal output. From this I want to create random numbers in the range of [a-Za-z0-9] *without violating randomness* and (if possible) without throwing away bits of the output. How can I do this mathemtically (in concern of the quality of output) correct? Thank you very much in advance for any help! Best regards, mcc ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] OT: Mapping random numbers (PRNG) 2014-06-06 2:56 [gentoo-user] OT: Mapping random numbers (PRNG) meino.cramer @ 2014-06-06 3:58 ` Canek Peláez Valdés 2014-06-06 18:39 ` meino.cramer 2014-06-06 21:03 ` Matti Nykyri 0 siblings, 2 replies; 26+ messages in thread From: Canek Peláez Valdés @ 2014-06-06 3:58 UTC (permalink / raw To: gentoo-user On Thu, Jun 5, 2014 at 9:56 PM, <meino.cramer@gmx.de> wrote: > Hi, > > I am experimenting with the C code of the ISAAC pseudo random number generator > (http://burtleburtle.net/bob/rand/isaacafa.html). > > Currently the implementation creates (on my embedded linux) 32 bit > hexadecimal output. So it's a 32 bit integer. > From this I want to create random numbers in the range of [a-Za-z0-9] > *without violating randomness* and (if possible) without throwing > away bits of the output. You mean *characters* int the range [A-Za-z0-9]? > How can I do this mathemtically (in concern of the quality of output) > correct? The easiest thing to do would be: ------------------------------------------------------------------------------- #include <time.h> #include <stdio.h> #include <stdlib.h> #define N (26+26+10) static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; int next_character() { // Use the correct call for ISAAC instead of rand() unsigned int idx = rand() % N; return S[idx]; } int main(int argc, char* argv[]) { // Use the correct call for initializing the ISAAC seed srand((unsigned int)time(NULL)); for (int i = 0; i < 20; i++) // --std=c99 printf("%c\n", next_character()); return 0; } ------------------------------------------------------------------------------- If the ISAAC RNG has a good distribution, then the next_character() function will give a good distribution among the set [A-Za-z0-9]. Unless I missunderstood what you meant with "create random numbers in the range of [a-Za-z0-9]". Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] OT: Mapping random numbers (PRNG) 2014-06-06 3:58 ` Canek Peláez Valdés @ 2014-06-06 18:39 ` meino.cramer 2014-06-06 19:37 ` null_ptr ` (2 more replies) 2014-06-06 21:03 ` Matti Nykyri 1 sibling, 3 replies; 26+ messages in thread From: meino.cramer @ 2014-06-06 18:39 UTC (permalink / raw To: gentoo-user Canek Peláez Valdés <caneko@gmail.com> [14-06-06 17:36]: > On Thu, Jun 5, 2014 at 9:56 PM, <meino.cramer@gmx.de> wrote: > > Hi, > > > > I am experimenting with the C code of the ISAAC pseudo random number generator > > (http://burtleburtle.net/bob/rand/isaacafa.html). > > > > Currently the implementation creates (on my embedded linux) 32 bit > > hexadecimal output. > > So it's a 32 bit integer. > > > From this I want to create random numbers in the range of [a-Za-z0-9] > > *without violating randomness* and (if possible) without throwing > > away bits of the output. > > You mean *characters* int the range [A-Za-z0-9]? > > > How can I do this mathemtically (in concern of the quality of output) > > correct? > > The easiest thing to do would be: > > ------------------------------------------------------------------------------- > #include <time.h> > #include <stdio.h> > #include <stdlib.h> > > #define N (26+26+10) > > static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', > 'K', 'L', 'M', > 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', > 'X', 'Y', 'Z', > 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', > 'k', 'l', 'm', > 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', > 'x', 'y', 'z', > '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; > > int > next_character() > { > // Use the correct call for ISAAC instead of rand() > unsigned int idx = rand() % N; > return S[idx]; > } > > int > main(int argc, char* argv[]) > { > // Use the correct call for initializing the ISAAC seed > srand((unsigned int)time(NULL)); > for (int i = 0; i < 20; i++) // --std=c99 > printf("%c\n", next_character()); > return 0; > } > ------------------------------------------------------------------------------- > > If the ISAAC RNG has a good distribution, then the next_character() > function will give a good distribution among the set [A-Za-z0-9]. > > Unless I missunderstood what you meant with "create random numbers in > the range of [a-Za-z0-9]". > > Regards. > -- > Canek Peláez Valdés > Profesor de asignatura, Facultad de Ciencias > Universidad Nacional Autónoma de México > Hi, Thank you very much for the input! :) I have a question about the algorithm: Suppose rand() has an equal distribution of numbers and furthermore one has a count of 2^32 random numbers listed in numerical sort order. In this list each number would appear (nearly) with the same count: 1 To get an better imagination of that...suppose the rand() would only return numbers in the range of 1...12 and the alphabet has only 8 characters (as 2^32 is not devideable by 62) rand(): 1 2 3 4 5 6 7 8 9 10 11 12 rand()%N : rand()%7 1 2 3 4 5 6 7 0 1 2 3 4 or in other words: An even distribution of numbers of rand() would result in a unevenly distributed sequence of characters...or? This would break the quality of ISAACs output. I am sure I did something wrong here...but where is the logic trap? Thank you very much for any help in advance! Best regards, mcc ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] OT: Mapping random numbers (PRNG) 2014-06-06 18:39 ` meino.cramer @ 2014-06-06 19:37 ` null_ptr 2014-06-06 19:55 ` Canek Peláez Valdés 2014-06-06 21:05 ` Matti Nykyri 2 siblings, 0 replies; 26+ messages in thread From: null_ptr @ 2014-06-06 19:37 UTC (permalink / raw To: gentoo-user On 06/06/14 20:39, meino.cramer@gmx.de wrote: >To get an better imagination of that...suppose the rand() would only >return numbers in the range of 1...12 and the alphabet has only 8 >characters (as 2^32 is not devideable by 62) > >rand(): >1 2 3 4 5 6 7 8 9 10 11 12 > >rand()%N : rand()%7 >1 2 3 4 5 6 7 0 1 2 3 4 > >or in other words: An even distribution of numbers of rand() >would result in a unevenly distributed sequence of characters...or? >This would break the quality of ISAACs output. > >I am sure I did something wrong here...but where is the logic trap? You thought on that is totally right. Let's say random numbers have the rande [0..RAND_MAX] Blindly calculating modulus creates a bias towards lower numbers if RAND_MAX+1 is not divisible by N. For most applications this bias is negligible. If you really want the same distribution you have to discard numbers greater than RAND_MAX - ((RAND_MAX+1)%N) (Beware the undefined behaviour in the above line if you try C.) Basically you want one less than the largest by N dividable number in your range. With numbers: rand(): 0 1 2 3 4 5 6 7 8 9 10 11 12 rand()%5 with discarding >12-((12+1)%5)=9: 0 1 2 3 4 0 1 2 3 4 discarded --- null_ptr Please don't follow me. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] OT: Mapping random numbers (PRNG) 2014-06-06 18:39 ` meino.cramer 2014-06-06 19:37 ` null_ptr @ 2014-06-06 19:55 ` Canek Peláez Valdés 2014-06-06 21:05 ` Matti Nykyri 2 siblings, 0 replies; 26+ messages in thread From: Canek Peláez Valdés @ 2014-06-06 19:55 UTC (permalink / raw To: gentoo-user On Fri, Jun 6, 2014 at 1:39 PM, <meino.cramer@gmx.de> wrote: > Canek Peláez Valdés <caneko@gmail.com> [14-06-06 17:36]: >> On Thu, Jun 5, 2014 at 9:56 PM, <meino.cramer@gmx.de> wrote: >> > Hi, >> > >> > I am experimenting with the C code of the ISAAC pseudo random number generator >> > (http://burtleburtle.net/bob/rand/isaacafa.html). >> > >> > Currently the implementation creates (on my embedded linux) 32 bit >> > hexadecimal output. >> >> So it's a 32 bit integer. >> >> > From this I want to create random numbers in the range of [a-Za-z0-9] >> > *without violating randomness* and (if possible) without throwing >> > away bits of the output. >> >> You mean *characters* int the range [A-Za-z0-9]? >> >> > How can I do this mathemtically (in concern of the quality of output) >> > correct? >> >> The easiest thing to do would be: >> >> ------------------------------------------------------------------------------- >> #include <time.h> >> #include <stdio.h> >> #include <stdlib.h> >> >> #define N (26+26+10) >> >> static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', >> 'K', 'L', 'M', >> 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', >> 'X', 'Y', 'Z', >> 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', >> 'k', 'l', 'm', >> 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', >> 'x', 'y', 'z', >> '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; >> >> int >> next_character() >> { >> // Use the correct call for ISAAC instead of rand() >> unsigned int idx = rand() % N; >> return S[idx]; >> } >> >> int >> main(int argc, char* argv[]) >> { >> // Use the correct call for initializing the ISAAC seed >> srand((unsigned int)time(NULL)); >> for (int i = 0; i < 20; i++) // --std=c99 >> printf("%c\n", next_character()); >> return 0; >> } >> ------------------------------------------------------------------------------- >> >> If the ISAAC RNG has a good distribution, then the next_character() >> function will give a good distribution among the set [A-Za-z0-9]. >> >> Unless I missunderstood what you meant with "create random numbers in >> the range of [a-Za-z0-9]". >> >> Regards. >> -- >> Canek Peláez Valdés >> Profesor de asignatura, Facultad de Ciencias >> Universidad Nacional Autónoma de México >> > > Hi, > > Thank you very much for the input! :) > > I have a question about the algorithm: > Suppose rand() has an equal distribution of numbers and furthermore > one has a count of 2^32 random numbers listed in numerical sort > order. > In this list each number would appear (nearly) with the same count: 1 > > To get an better imagination of that...suppose the rand() would only > return numbers in the range of 1...12 and the alphabet has only 8 > characters (as 2^32 is not devideable by 62) > > rand(): > 1 2 3 4 5 6 7 8 9 10 11 12 > > rand()%N : rand()%7 > 1 2 3 4 5 6 7 0 1 2 3 4 > > or in other words: An even distribution of numbers of rand() > would result in a unevenly distributed sequence of characters...or? > This would break the quality of ISAACs output. > > I am sure I did something wrong here...but where is the logic trap? In theory, it doesn't matter if the number of random characters you want is not multiple of the size of the set of characters. In practice, it doesn't matter if the number of random characters you want is big enough. Consider the following modification to my program (I changed the order of the array S so it's sorted and I can do binary search): ---------------------------------------------------------------------- #include <time.h> #include <stdio.h> #include <stdlib.h> #define N (10+26+26) static char S[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' }; static int index_of_aux(char c, int a, int b) { if (a > b) return -1; int m = (a + b) / 2; if (S[m] == c) return m; if (S[m] > c) return index_of_aux(c, a, m-1); return index_of_aux(c, m+1, b); } static int index_of(char c) { return index_of_aux(c, 0, N-1); } int next_character() { // Use the correct call for isaac instead of rand() unsigned int idx = rand() % N; return S[idx]; } int main(int argc, char* argv[]) { srand((unsigned int)time(NULL)); int total = 1000000; // Size = 62 int count[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; for (int i = 0; i < total; i++) { char c = next_character(); count[index_of(c)]++; } int min = total, max = 0; for (int i = 0; i < N; i++) { printf("Count %d: %d\n", i, count[i]); min = (count[i] < min) ? count[i] : min; max = (count[i] > max) ? count[i] : max; } printf("Min: %d\n", min); printf("Max: %d\n", max); printf("Diff: %d\n", (max-min)); return 0; } ---------------------------------------------------------------------- The output is the following: Count 0: 16060 Count 1: 16102 Count 2: 15968 Count 3: 16075 ... Count 59: 16220 Count 60: 16143 Count 61: 16177 Min: 15819 Max: 16366 Diff: 547 As you can see, all the characters appear around 16,000 times, so it's pretty well distributed. If the RNG has a good distribution, it will *always* look something like this. The difference will *never* be 0, since no RNG is perfectly distributed. So, TL;DR: it doesn't matter if total is not multiple of N. Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] OT: Mapping random numbers (PRNG) 2014-06-06 18:39 ` meino.cramer 2014-06-06 19:37 ` null_ptr 2014-06-06 19:55 ` Canek Peláez Valdés @ 2014-06-06 21:05 ` Matti Nykyri 2 siblings, 0 replies; 26+ messages in thread From: Matti Nykyri @ 2014-06-06 21:05 UTC (permalink / raw To: gentoo-user On Fri, Jun 06, 2014 at 08:39:28PM +0200, meino.cramer@gmx.de wrote: > Canek Peláez Valdés <caneko@gmail.com> [14-06-06 17:36]: > > On Thu, Jun 5, 2014 at 9:56 PM, <meino.cramer@gmx.de> wrote: > > > Hi, > > > > > > I am experimenting with the C code of the ISAAC pseudo random number generator > > > (http://burtleburtle.net/bob/rand/isaacafa.html). > > > > > > Currently the implementation creates (on my embedded linux) 32 bit > > > hexadecimal output. > > > > So it's a 32 bit integer. > > > > > From this I want to create random numbers in the range of [a-Za-z0-9] > > > *without violating randomness* and (if possible) without throwing > > > away bits of the output. > > > > You mean *characters* int the range [A-Za-z0-9]? > > > > > How can I do this mathemtically (in concern of the quality of output) > > > correct? > > > > The easiest thing to do would be: > > > > ------------------------------------------------------------------------------- > > #include <time.h> > > #include <stdio.h> > > #include <stdlib.h> > > > > #define N (26+26+10) > > > > static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', > > 'K', 'L', 'M', > > 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', > > 'X', 'Y', 'Z', > > 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', > > 'k', 'l', 'm', > > 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', > > 'x', 'y', 'z', > > '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; > > > > int > > next_character() > > { > > // Use the correct call for ISAAC instead of rand() > > unsigned int idx = rand() % N; > > return S[idx]; > > } > > > > int > > main(int argc, char* argv[]) > > { > > // Use the correct call for initializing the ISAAC seed > > srand((unsigned int)time(NULL)); > > for (int i = 0; i < 20; i++) // --std=c99 > > printf("%c\n", next_character()); > > return 0; > > } > > ------------------------------------------------------------------------------- > > > > If the ISAAC RNG has a good distribution, then the next_character() > > function will give a good distribution among the set [A-Za-z0-9]. > > > > Unless I missunderstood what you meant with "create random numbers in > > the range of [a-Za-z0-9]". > > > > Regards. > > -- > > Canek Peláez Valdés > > Profesor de asignatura, Facultad de Ciencias > > Universidad Nacional Autónoma de México > > > > Hi, > > Thank you very much for the input! :) > > I have a question about the algorithm: > Suppose rand() has an equal distribution of numbers and furthermore > one has a count of 2^32 random numbers listed in numerical sort > order. > In this list each number would appear (nearly) with the same count: 1 > > To get an better imagination of that...suppose the rand() would only > return numbers in the range of 1...12 and the alphabet has only 8 > characters (as 2^32 is not devideable by 62) > > rand(): > 1 2 3 4 5 6 7 8 9 10 11 12 > > rand()%N : rand()%7 > 1 2 3 4 5 6 7 0 1 2 3 4 > > or in other words: An even distribution of numbers of rand() > would result in a unevenly distributed sequence of characters...or? > This would break the quality of ISAACs output. > > I am sure I did something wrong here...but where is the logic trap? > This is the thing I explained in my message. -- -Matti ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] OT: Mapping random numbers (PRNG) 2014-06-06 3:58 ` Canek Peláez Valdés 2014-06-06 18:39 ` meino.cramer @ 2014-06-06 21:03 ` Matti Nykyri 2014-06-07 9:19 ` Matti Nykyri 2014-06-26 21:00 ` [gentoo-user] " Kai Krakow 1 sibling, 2 replies; 26+ messages in thread From: Matti Nykyri @ 2014-06-06 21:03 UTC (permalink / raw To: gentoo-user On Thu, Jun 05, 2014 at 10:58:51PM -0500, Canek Peláez Valdés wrote: > On Thu, Jun 5, 2014 at 9:56 PM, <meino.cramer@gmx.de> wrote: > > Hi, > > > > I am experimenting with the C code of the ISAAC pseudo random number generator > > (http://burtleburtle.net/bob/rand/isaacafa.html). > > > > Currently the implementation creates (on my embedded linux) 32 bit > > hexadecimal output. > > So it's a 32 bit integer. > > > From this I want to create random numbers in the range of [a-Za-z0-9] > > *without violating randomness* and (if possible) without throwing > > away bits of the output. > > You mean *characters* int the range [A-Za-z0-9]? Well this isn't as simple problem as it sounds. A random 32 bit integer has 32 bits of randomness. If you take a divison reminder of 62 from this integer you will get only 5,95419631039 bits of randomness (log(62)/log(2)). So you are wasting 81,4% of your random data. Which is quite much and usually random data is quite expensive. You can save your precious random data by taking only 6 bit from your 32 bit integer and dividing it by 62. Then you will be wasting only 0,8% of random data. Another problem is alignment, but that is about mathematical correctness. > > How can I do this mathemtically (in concern of the quality of output) > > correct? > > The easiest thing to do would be: The easiest is not mathematically correct though. Random data will stay random only if you select and modify it so that randomness is preserved. If you take devison reminder of 62 from 32 bit integer there are 69 273 667 possibilities of the reminder to be 3 or less. For the reminder to 4 or more the number of possibilities is 69 273 666. In mathematically ideal case the probability for every index of the list should be same: 1/62 = 1,61290322581%. But the modulo 62 modifies this probability: for index 0-3 the probability is 69 273 667/2^32 = 1,61290324759%. And for indexes 4-61 the probability will be 69 273 666/2^32 = 1,6129032243%. If you wish not to waste those random bits the probabilities will get worse. With 6 bits of random the probability for index 0-1 will be 2/64 and for 2-63 it will be 1/64. This is a very significant change because first and second index will appear twice as much as the rest. If you add 2 characters to your list you will perfect alignment and you can take 6 bits of data without it modifying probabilities. If you are looking a mathematically perfect solution there is a simple one even if your list is not in the power of 2! Take 6 bits at a time of the random data. If the result is 62 or 63 you will discard the data and get the next 6 bits. This selectively modifies the random data but keeps the probabilities in correct balance. Now the probability for index of 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. > ------------------------------------------------------------------------------- > #include <time.h> > #include <stdio.h> > #include <stdlib.h> > > #define N (26+26+10) > > static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', > 'K', 'L', 'M', > 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', > 'X', 'Y', 'Z', > 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', > 'k', 'l', 'm', > 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', > 'x', 'y', 'z', > '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; > > int > next_character() > { > // Use the correct call for ISAAC instead of rand() > unsigned int idx = rand() % N; > return S[idx]; > } so modify the next_char function: char next_character() { static unsigned int rand = 0; //(sizeof(int) = 32) static char bit_avail = 0; char result = 0; char move_bits = 0; char bits_moved = 0; do { if (!bits_avail) { // Use the correct call for ISAAC instead of rand() rand = rand(); bit_avail = 32; } move_bits = bits_avail >= 6 ? 6 : bits_avail; result <<= move_bits; result = (result | rand & (0xFF >> (8 - move_bits))) & 0x3F; bits_avail -= move_bits; bits_moved += move_bits; rand >>= move_bits; } while (bits_moved != 6 && result > 61); return result; } This function will give perfect distribution of 1/62 probability for every index. It will waste 6 bits with the probability of 1/32 (2/64). > int > main(int argc, char* argv[]) > { > // Use the correct call for initializing the ISAAC seed > srand((unsigned int)time(NULL)); > for (int i = 0; i < 20; i++) // --std=c99 > printf("%c\n", next_character()); > return 0; > } > ------------------------------------------------------------------------------- > > If the ISAAC RNG has a good distribution, then the next_character() > function will give a good distribution among the set [A-Za-z0-9]. > > Unless I missunderstood what you meant with "create random numbers in > the range of [a-Za-z0-9]". -- -Matti ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] OT: Mapping random numbers (PRNG) 2014-06-06 21:03 ` Matti Nykyri @ 2014-06-07 9:19 ` Matti Nykyri 2014-06-07 12:13 ` Marc Joliet 2014-06-26 21:00 ` [gentoo-user] " Kai Krakow 1 sibling, 1 reply; 26+ messages in thread From: Matti Nykyri @ 2014-06-07 9:19 UTC (permalink / raw To: gentoo-user On Sat, Jun 07, 2014 at 12:03:29AM +0300, Matti Nykyri wrote: > On Thu, Jun 05, 2014 at 10:58:51PM -0500, Canek Peláez Valdés wrote: > > On Thu, Jun 5, 2014 at 9:56 PM, <meino.cramer@gmx.de> wrote: > > > Hi, > > > > > > I am experimenting with the C code of the ISAAC pseudo random number generator > > > (http://burtleburtle.net/bob/rand/isaacafa.html). > > > > > > Currently the implementation creates (on my embedded linux) 32 bit > > > hexadecimal output. > > > > So it's a 32 bit integer. > > > > > From this I want to create random numbers in the range of [a-Za-z0-9] > > > *without violating randomness* and (if possible) without throwing > > > away bits of the output. > > > > You mean *characters* int the range [A-Za-z0-9]? > > Well this isn't as simple problem as it sounds. A random 32 bit integer > has 32 bits of randomness. If you take a divison reminder of 62 from this > integer you will get only 5,95419631039 bits of randomness > (log(62)/log(2)). So you are wasting 81,4% of your random data. Which is > quite much and usually random data is quite expensive. You can save your > precious random data by taking only 6 bit from your 32 bit integer and > dividing it by 62. Then you will be wasting only 0,8% of random data. > Another problem is alignment, but that is about mathematical correctness. > > > > How can I do this mathemtically (in concern of the quality of output) > > > correct? > > > > The easiest thing to do would be: > > The easiest is not mathematically correct though. Random data will stay > random only if you select and modify it so that randomness is preserved. > If you take devison reminder of 62 from 32 bit integer there are 69 273 > 667 possibilities of the reminder to be 3 or less. For the reminder to 4 > or more the number of possibilities is 69 273 666. In mathematically > ideal case the probability for every index of the list should be same: > 1/62 = 1,61290322581%. But the modulo 62 modifies this probability: for > index 0-3 the probability is 69 273 667/2^32 = 1,61290324759%. And for > indexes 4-61 the probability will be 69 273 666/2^32 = 1,6129032243%. > > If you wish not to waste those random bits the probabilities will get > worse. With 6 bits of random the probability for index 0-1 will be 2/64 > and for 2-63 it will be 1/64. This is a very significant change because > first and second index will appear twice as much as the rest. If you add > 2 characters to your list you will perfect alignment and you can take 6 > bits of data without it modifying probabilities. > > If you are looking a mathematically perfect solution there is a simple > one even if your list is not in the power of 2! Take 6 bits at a time of > the random data. If the result is 62 or 63 you will discard the data and > get the next 6 bits. This selectively modifies the random data but keeps > the probabilities in correct balance. Now the probability for index of > 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. > > > ------------------------------------------------------------------------------- > > #include <time.h> > > #include <stdio.h> > > #include <stdlib.h> > > > > #define N (26+26+10) > > > > static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', > > 'K', 'L', 'M', > > 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', > > 'X', 'Y', 'Z', > > 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', > > 'k', 'l', 'm', > > 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', > > 'x', 'y', 'z', > > '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; > > > > int > > next_character() > > { > > // Use the correct call for ISAAC instead of rand() > > unsigned int idx = rand() % N; > > return S[idx]; > > } > > so modify the next_char function: > > char next_character() > { > static unsigned int rand = 0; //(sizeof(int) = 32) > static char bit_avail = 0; > char result = 0; > char move_bits = 0; > char bits_moved = 0; > > do { > if (!bits_avail) { > // Use the correct call for ISAAC instead of rand() > rand = rand(); > > bit_avail = 32; > } > > move_bits = bits_avail >= 6 ? 6 : bits_avail; > result <<= move_bits; > result = (result | rand & (0xFF >> (8 - move_bits))) & 0x3F; > bits_avail -= move_bits; > bits_moved += move_bits; > rand >>= move_bits; > > } while (bits_moved != 6 && result > 61); > > return result; > } Well actually it looks simpler if you break this like this: unsigned char get_6bits () { static unsigned int rand = 0; //(sizeof(int) = 32) static char bits_avail = 0; unsigned char result = 0; //get 2 bits 3 times: 32 is devidable by 2 for (int i = 0; i < 3; i++) { // --std=c99 //Fill buffer if it is empty! if (!bits_avail || bits_avail < 0 ) { //if bits_avail < 0 it is an error! // Use the correct call for ISAAC instead of rand() rand = rand(); bits_avail = 32; } result <<= 2; //move two bits to left. result = result | (rand & 0x3); //add two least signifigant bits to the result. rand >>= 2; //move two bits to right. bits_avail -= 2; } return result; //result has 6 bits of random data... } char next_character() { unsigned char idx = 0; do { idx = get_6bits(); } while (idx > 61); return S[idx]; } Very simple :) > This function will give perfect distribution of 1/62 probability for > every index. It will waste 6 bits with the probability of 1/32 (2/64). > > > int > > main(int argc, char* argv[]) > > { > > // Use the correct call for initializing the ISAAC seed > > srand((unsigned int)time(NULL)); > > for (int i = 0; i < 20; i++) // --std=c99 > > printf("%c\n", next_character()); > > return 0; > > } > > ------------------------------------------------------------------------------- > > > > If the ISAAC RNG has a good distribution, then the next_character() > > function will give a good distribution among the set [A-Za-z0-9]. > > > > Unless I missunderstood what you meant with "create random numbers in > > the range of [a-Za-z0-9]". -- -Matti ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] OT: Mapping random numbers (PRNG) 2014-06-07 9:19 ` Matti Nykyri @ 2014-06-07 12:13 ` Marc Joliet 0 siblings, 0 replies; 26+ messages in thread From: Marc Joliet @ 2014-06-07 12:13 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1404 bytes --] Am Sat, 7 Jun 2014 12:19:11 +0300 schrieb Matti Nykyri <Matti.Nykyri@iki.fi>: > On Sat, Jun 07, 2014 at 12:03:29AM +0300, Matti Nykyri wrote: [...] > unsigned char get_6bits () > { > static unsigned int rand = 0; //(sizeof(int) = 32) Just an itty bitty nitpick: since you already require C99 as per the for loop below, you might as well use uint32_t (from stdint.h) instead of assuming that sizeof(int) == 32 :) . > static char bits_avail = 0; > unsigned char result = 0; > > //get 2 bits 3 times: 32 is devidable by 2 > for (int i = 0; i < 3; i++) { // --std=c99 > //Fill buffer if it is empty! > if (!bits_avail || bits_avail < 0 ) { //if bits_avail < 0 it is an error! > // Use the correct call for ISAAC instead of rand() > rand = rand(); > > bits_avail = 32; > } > > result <<= 2; //move two bits to left. > result = result | (rand & 0x3); //add two least signifigant bits to the result. > rand >>= 2; //move two bits to right. > bits_avail -= 2; > } > > return result; //result has 6 bits of random data... > } > > char next_character() > { > unsigned char idx = 0; > do { > idx = get_6bits(); > } while (idx > 61); > > return S[idx]; > } > > Very simple :) [...] -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* [gentoo-user] Re: OT: Mapping random numbers (PRNG) 2014-06-06 21:03 ` Matti Nykyri 2014-06-07 9:19 ` Matti Nykyri @ 2014-06-26 21:00 ` Kai Krakow 2014-06-26 22:07 ` Kai Krakow 2014-06-27 21:13 ` [gentoo-user] " Matti Nykyri 1 sibling, 2 replies; 26+ messages in thread From: Kai Krakow @ 2014-06-26 21:00 UTC (permalink / raw To: gentoo-user Matti Nykyri <Matti.Nykyri@iki.fi> schrieb: > If you are looking a mathematically perfect solution there is a simple > one even if your list is not in the power of 2! Take 6 bits at a time of > the random data. If the result is 62 or 63 you will discard the data and > get the next 6 bits. This selectively modifies the random data but keeps > the probabilities in correct balance. Now the probability for index of > 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. Why not do just something like this? index = 0; while (true) { index = (index + get_6bit_random()) % 62; output << char_array[index]; } Done, no bits wasted. Should have perfect distribution also. We also don't have to throw away random data just to stay within unaligned boundaries. The unalignment is being taken over into the next loop so the "error" corrects itself over time (it becomes distributed over the whole set). -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* [gentoo-user] Re: OT: Mapping random numbers (PRNG) 2014-06-26 21:00 ` [gentoo-user] " Kai Krakow @ 2014-06-26 22:07 ` Kai Krakow 2014-06-27 8:55 ` thegeezer 2014-06-27 21:13 ` [gentoo-user] " Matti Nykyri 1 sibling, 1 reply; 26+ messages in thread From: Kai Krakow @ 2014-06-26 22:07 UTC (permalink / raw To: gentoo-user Kai Krakow <hurikhan77@gmail.com> schrieb: > Matti Nykyri <Matti.Nykyri@iki.fi> schrieb: > >> If you are looking a mathematically perfect solution there is a simple >> one even if your list is not in the power of 2! Take 6 bits at a time of >> the random data. If the result is 62 or 63 you will discard the data and >> get the next 6 bits. This selectively modifies the random data but keeps >> the probabilities in correct balance. Now the probability for index of >> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. > > Why not do just something like this? > > index = 0; > while (true) { > index = (index + get_6bit_random()) % 62; > output << char_array[index]; > } > > Done, no bits wasted. Should have perfect distribution also. We also don't > have to throw away random data just to stay within unaligned boundaries. > The unalignment is being taken over into the next loop so the "error" > corrects itself over time (it becomes distributed over the whole set). It is worth noting that my approach has the tendency of generating random characters in sequence. I think there are several possibilities to fix this, one being swapping two elements in the char array on each access. That fixed this particular problem in my tests. Here's are quick ruby snippet I played with: $ cat random.rb # danger: ugly code ahead dist = [0] * 62 seq = [0, 0] chars = [*('A'..'Z'), *('a'..'z'), *('0'..'9')] orig_chars = chars.dup index = 0 old = nil (62 * 10).times do # generate random wrapped index index = (index + rand(64)) % 62 # swap two indexed positions and output the char chars[0], chars[index] = chars[index], chars[0] $stdout.write chars[0] # count the generated indexes dist[index] = dist[index] + 1 # count if indexes were generated in sequence by looking # them up in the original sorted chars array unless old.nil? seq[0] = seq[0] + 1 if old < orig_chars.index(chars[0]) seq[1] = seq[1] + 1 if old > orig_chars.index(chars[0]) end old = orig_chars.index(chars[0]) end puts # check the average, it should always equal the loop count # multiplicator avg = dist.inject(:+) / 62.0 # calculate if the algorithm preferred sequences seq = [*seq, seq[0] - seq[1]] # output some stats puts avg puts dist.inspect puts dist.map {|v| avg - v }.inspect puts seq.inspect -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG) 2014-06-26 22:07 ` Kai Krakow @ 2014-06-27 8:55 ` thegeezer 2014-06-27 17:49 ` Matti Nykyri 2014-06-27 17:50 ` [gentoo-user] " Kai Krakow 0 siblings, 2 replies; 26+ messages in thread From: thegeezer @ 2014-06-27 8:55 UTC (permalink / raw To: gentoo-user On 06/26/2014 11:07 PM, Kai Krakow wrote: > > It is worth noting that my approach has the tendency of generating random > characters in sequence. sorry but had to share this http://dilbert.com/strips/comic/2001-10-25/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG) 2014-06-27 8:55 ` thegeezer @ 2014-06-27 17:49 ` Matti Nykyri 2014-06-27 17:50 ` [gentoo-user] " Kai Krakow 1 sibling, 0 replies; 26+ messages in thread From: Matti Nykyri @ 2014-06-27 17:49 UTC (permalink / raw To: gentoo-user@lists.gentoo.org > On Jun 27, 2014, at 11:55, thegeezer <thegeezer@thegeezer.net> wrote: > >> On 06/26/2014 11:07 PM, Kai Krakow wrote: >> >> It is worth noting that my approach has the tendency of generating random >> characters in sequence. > > sorry but had to share this http://dilbert.com/strips/comic/2001-10-25/ > This is a good one :) have really been thinking this same comic previosly when writing to this thread... ^ permalink raw reply [flat|nested] 26+ messages in thread
* [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) 2014-06-27 8:55 ` thegeezer 2014-06-27 17:49 ` Matti Nykyri @ 2014-06-27 17:50 ` Kai Krakow 2014-06-27 20:59 ` Neil Bothwick 1 sibling, 1 reply; 26+ messages in thread From: Kai Krakow @ 2014-06-27 17:50 UTC (permalink / raw To: gentoo-user thegeezer <thegeezer@thegeezer.net> schrieb: > On 06/26/2014 11:07 PM, Kai Krakow wrote: >> >> It is worth noting that my approach has the tendency of generating random >> characters in sequence. > > sorry but had to share this http://dilbert.com/strips/comic/2001-10-25/ :-) I'm no mathematician, but well, I think the swapping approach fixes it. What this makes clear, however, is that randomness on its own does not completely ensure unpredictable items if combined with other functions. One has to carefully think about it. I, for myself, would always stay away from using modulo to clip the random numbers. It will always create bias. My first idea introduced predictable followers (you always knew that the next char had a specific probability related to the tail length of the list). You can actually learn from Dilbert comics. ;-) -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) 2014-06-27 17:50 ` [gentoo-user] " Kai Krakow @ 2014-06-27 20:59 ` Neil Bothwick 0 siblings, 0 replies; 26+ messages in thread From: Neil Bothwick @ 2014-06-27 20:59 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 296 bytes --] On Fri, 27 Jun 2014 19:50:15 +0200, Kai Krakow wrote: > You can actually learn from Dilbert comics. ;-) Unless you're a PHB, they never learn. -- Neil Bothwick "You know how dumb the average person is? Well, statistically, half of them are even dumber than that" - Lewton, P.I. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG) 2014-06-26 21:00 ` [gentoo-user] " Kai Krakow 2014-06-26 22:07 ` Kai Krakow @ 2014-06-27 21:13 ` Matti Nykyri 2014-06-28 13:08 ` Matti Nykyri 2014-06-28 21:28 ` [gentoo-user] " Kai Krakow 1 sibling, 2 replies; 26+ messages in thread From: Matti Nykyri @ 2014-06-27 21:13 UTC (permalink / raw To: gentoo-user@lists.gentoo.org > On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote: > > Matti Nykyri <Matti.Nykyri@iki.fi> schrieb: > >> If you are looking a mathematically perfect solution there is a simple >> one even if your list is not in the power of 2! Take 6 bits at a time of >> the random data. If the result is 62 or 63 you will discard the data and >> get the next 6 bits. This selectively modifies the random data but keeps >> the probabilities in correct balance. Now the probability for index of >> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. > > Why not do just something like this? > > index = 0; > while (true) { > index = (index + get_6bit_random()) % 62; > output << char_array[index]; > } > > Done, no bits wasted. Should have perfect distribution also. We also don't > have to throw away random data just to stay within unaligned boundaries. The > unalignment is being taken over into the next loop so the "error" corrects > itself over time (it becomes distributed over the whole set). Distribution will not be perfect. The same original problem persists. Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. Now the addition changes this so that index 0 to 1 reflects to previous character and not the original index. The distribution of like 10GB of data should be quite even but not on a small scale. The next char will depend on previous char. It is 100% more likely that the next char is the same or one index above the previous char then any of the other ones in the series. So it is likely that you will have long sets of same character. Random means that for next char the probability is always even, 1/62. And like mentioned in Dilbert it is impossible to say that something is random but possible to say that it isn't. If wasting 6bit of data seems large, do this: index = get_6bit_random(); while (index > 61) { index <<= 1; index |= get_1bit_random(); index &= 0x3F; } return index; It will waste 1 bit at a time until result is less than 62. This will slightly change probabilities though :/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG) 2014-06-27 21:13 ` [gentoo-user] " Matti Nykyri @ 2014-06-28 13:08 ` Matti Nykyri 2014-06-28 21:28 ` [gentoo-user] " Kai Krakow 1 sibling, 0 replies; 26+ messages in thread From: Matti Nykyri @ 2014-06-28 13:08 UTC (permalink / raw To: gentoo-user@lists.gentoo.org On Jun 28, 2014, at 0:13, Matti Nykyri <matti.nykyri@iki.fi> wrote: >> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote: >> >> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb: >> >>> If you are looking a mathematically perfect solution there is a simple >>> one even if your list is not in the power of 2! Take 6 bits at a time of >>> the random data. If the result is 62 or 63 you will discard the data and >>> get the next 6 bits. This selectively modifies the random data but keeps >>> the probabilities in correct balance. Now the probability for index of >>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. >> >> Why not do just something like this? >> >> index = 0; >> while (true) { >> index = (index + get_6bit_random()) % 62; >> output << char_array[index]; >> } >> >> Done, no bits wasted. Should have perfect distribution also. We also don't >> have to throw away random data just to stay within unaligned boundaries. The >> unalignment is being taken over into the next loop so the "error" corrects >> itself over time (it becomes distributed over the whole set). > > Distribution will not be perfect. The same original problem persists. Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. Now the addition changes this so that index 0 to 1 reflects to previous character and not the original index. > > The distribution of like 10GB of data should be quite even but not on a small scale. The next char will depend on previous char. It is 100% more likely that the next char is the same or one index above the previous char then any of the other ones in the series. So it is likely that you will have long sets of same character. > > Random means that for next char the probability is always even, 1/62. And like mentioned in Dilbert it is impossible to say that something is random but possible to say that it isn't. > > If wasting 6bit of data seems large, do this: > > index = get_6bit_random(); > while (index > 61) { > index <<= 1; > index |= get_1bit_random(); > index &= 0x3F; > } > return index; > > It will waste 1 bit at a time until result is less than 62. This will slightly change probabilities though :/ Sorry this example is really flawed :( If next6bit is over 61 there are only two possible values for it: 62 or 63 -> that is 0x3E and 0x3F. So you see that only one bit changes. But that bit is random! So least significant bit is random and does not need to be discarded :) index = get_6bit_random(); while (index > 61) { index <<= 5; index |= get_5bit_random(); index &= 0x3F; } return index; ^ permalink raw reply [flat|nested] 26+ messages in thread
* [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) 2014-06-27 21:13 ` [gentoo-user] " Matti Nykyri 2014-06-28 13:08 ` Matti Nykyri @ 2014-06-28 21:28 ` Kai Krakow 2014-06-28 22:06 ` Canek Peláez Valdés 2014-06-29 6:24 ` [gentoo-user] " Matti Nykyri 1 sibling, 2 replies; 26+ messages in thread From: Kai Krakow @ 2014-06-28 21:28 UTC (permalink / raw To: gentoo-user Matti Nykyri <matti.nykyri@iki.fi> schrieb: >> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote: >> >> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb: >> >>> If you are looking a mathematically perfect solution there is a simple >>> one even if your list is not in the power of 2! Take 6 bits at a time of >>> the random data. If the result is 62 or 63 you will discard the data and >>> get the next 6 bits. This selectively modifies the random data but keeps >>> the probabilities in correct balance. Now the probability for index of >>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. >> >> Why not do just something like this? >> >> index = 0; >> while (true) { >> index = (index + get_6bit_random()) % 62; >> output << char_array[index]; >> } >> >> Done, no bits wasted. Should have perfect distribution also. We also >> don't have to throw away random data just to stay within unaligned >> boundaries. The unalignment is being taken over into the next loop so the >> "error" corrects itself over time (it becomes distributed over the whole >> set). > > Distribution will not be perfect. The same original problem persists. > Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. > Now the addition changes this so that index 0 to 1 reflects to previous > character and not the original index. > > The distribution of like 10GB of data should be quite even but not on a > small scale. The next char will depend on previous char. It is 100% more > likely that the next char is the same or one index above the previous char > then any of the other ones in the series. So it is likely that you will > have long sets of same character. I cannot follow your reasoning here - but I'd like to learn. Actually, I ran this multiple times and never saw long sets of the same character, even no short sets of the same character. The 0 or 1 is always rolled over into the next random addition. I would only get sets of the same character if rand() returned zero multiple times after each other - which wouldn't be really random. ;-) Keep in mind: The last index will be reused whenever you'd enter the function - it won't reset to zero. But still that primitive implementation had a flaw: It will tend to select characters beyond the current offset, if it is >= 1/2 into the complete set, otherwise it will prefer selecting characters before the offset. In my tests I counted how ofter new_index > index and new_index < index, and it had a clear bias for the first. So I added swapping of the selected index with offset=0 in the set. Now the characters will be swapped and start to distribute that flaw. The distribution, however, didn't change. Of course I'm no mathematician, I don't know how I'd calculate the probabilities for my implementation because it is sort of a recursive function (for get_rand()) when looking at it over time: int get_rand() { static int index = 0; return (index = (index + get_6bit_rand()) % 62); } char get_char() { int index = get_rand(); char tmp = chars[index]; chars[index] = chars[0]; return (chars[0] = tmp); } However, get_char() should return evenly distributes results. What this shows, is, that while distribution is even among the result set, the implementation may still be flawed because results could be predictable for a subset of results. Or in other words: Simply looking at the distribution of results is not an indicator for randomness. I could change get_rand() in the following way: int get_rand() { static int index = 0; return (index = (index + 1) % 62); } Results would be distributed even, but clearly it is not random. -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) 2014-06-28 21:28 ` [gentoo-user] " Kai Krakow @ 2014-06-28 22:06 ` Canek Peláez Valdés 2014-06-29 0:37 ` gottlieb 2014-06-29 6:24 ` [gentoo-user] " Matti Nykyri 1 sibling, 1 reply; 26+ messages in thread From: Canek Peláez Valdés @ 2014-06-28 22:06 UTC (permalink / raw To: gentoo-user On Sat, Jun 28, 2014 at 4:28 PM, Kai Krakow <hurikhan77@gmail.com> wrote: [ ... ] > I cannot follow your reasoning here - but I'd like to learn. Actually, I ran > this multiple times and never saw long sets of the same character, even no > short sets of the same character. The 0 or 1 is always rolled over into the > next random addition. That doesn't matter. Take a non-negative integer N; if you flip a coin an infinite number of times, then the probability of the coin landing on the same face N times in a row is 1. This means that it is *guaranteed* to happen, and it *will* happen for any N you want: 1,000,000, a thousand billions, a gazillion. That is a mathematical fact. This of course is a consequence of "infinite" being really really large, but it means that technically you cannot rule a RNG as broken only because you saw that it produced the same result N times, which is the crux of the Dilbert joke. In practice, of course, it's a big sign that something is wrong. But there is a non-zero probability that it's actually correct. Because with randomness, you can never be sure. Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) 2014-06-28 22:06 ` Canek Peláez Valdés @ 2014-06-29 0:37 ` gottlieb 2014-06-29 0:53 ` Canek Peláez Valdés 0 siblings, 1 reply; 26+ messages in thread From: gottlieb @ 2014-06-29 0:37 UTC (permalink / raw To: gentoo-user On Sat, Jun 28 2014, Canek Peláez Valdés wrote: > That doesn't matter. Take a non-negative integer N; if you flip a coin > an infinite number of times, then the probability of the coin landing > on the same face N times in a row is 1. This is certainly true. > This means that it is *guaranteed* to happen That is not as clear. Prob = 1 does not always mean certain (when there are infinite possibilities). For example the probability is zero that a random rational number chosen between 0 and 1 is exactly 1/2. So the probability is 1 that the number is not 1/2. However it is not certain that the random choice will not be 1/2. allan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) 2014-06-29 0:37 ` gottlieb @ 2014-06-29 0:53 ` Canek Peláez Valdés 2014-06-29 1:46 ` [gentoo-user] " »Q« 0 siblings, 1 reply; 26+ messages in thread From: Canek Peláez Valdés @ 2014-06-29 0:53 UTC (permalink / raw To: gentoo-user On Sat, Jun 28, 2014 at 7:37 PM, <gottlieb@nyu.edu> wrote: > On Sat, Jun 28 2014, Canek Peláez Valdés wrote: > >> That doesn't matter. Take a non-negative integer N; if you flip a coin >> an infinite number of times, then the probability of the coin landing >> on the same face N times in a row is 1. > > This is certainly true. > >> This means that it is *guaranteed* to happen > > That is not as clear. Let me be more precise (and please correct me if I'm wrong): It is guaranteed to happen at some point in the infinite sequence of random flip coins, but we cannot know when it will happen, only that it will happen. That's the way I got it when I took my probability courses, admittedly many years ago. In any way, even if I'm wrong and it is not guaranteed, the main point remains true: the probability of getting a large sequence of the same number from a RNG is 1 for every true random RNG, and therefore seeing a large sequence of the same number form a RNG doesn't (technically) means that it is broken. Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México ^ permalink raw reply [flat|nested] 26+ messages in thread
* [gentoo-user] Re: OT: Mapping random numbers (PRNG) 2014-06-29 0:53 ` Canek Peláez Valdés @ 2014-06-29 1:46 ` »Q« 2014-06-29 2:57 ` Canek Peláez Valdés 0 siblings, 1 reply; 26+ messages in thread From: »Q« @ 2014-06-29 1:46 UTC (permalink / raw To: gentoo-user On Sat, 28 Jun 2014 19:53:08 -0500 Canek Peláez Valdés <caneko@gmail.com> wrote: > On Sat, Jun 28, 2014 at 7:37 PM, <gottlieb@nyu.edu> wrote: > > On Sat, Jun 28 2014, Canek Peláez Valdés wrote: > > > >> That doesn't matter. Take a non-negative integer N; if you flip a > >> coin an infinite number of times, then the probability of the coin > >> landing on the same face N times in a row is 1. > > > > This is certainly true. > > > >> This means that it is *guaranteed* to happen > > > > That is not as clear. > > Let me be more precise (and please correct me if I'm wrong): It is > guaranteed to happen at some point in the infinite sequence of random > flip coins, but we cannot know when it will happen, only that it will > happen. > > That's the way I got it when I took my probability courses, admittedly > many years ago. The probability is 1 in the sense that the as the number of flips M increases, so does the probability of getting N heads (or tails) in a row also increases, and the upper bound for the sequence of probabilities is 1. It's not a probability about something which actually happens; no one so far has been able to flip a coin an infinite number of times, not even a computer. > In any way, even if I'm wrong and it is not guaranteed, the main point > remains true: the probability of getting a large sequence of the same > number from a RNG is 1 for every true random RNG, and therefore seeing > a large sequence of the same number form a RNG doesn't (technically) > means that it is broken. It's true that that wouldn't *prove* the generator is broken. But it might be a good reason to take another look at the algorithm. > > Regards. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG) 2014-06-29 1:46 ` [gentoo-user] " »Q« @ 2014-06-29 2:57 ` Canek Peláez Valdés 0 siblings, 0 replies; 26+ messages in thread From: Canek Peláez Valdés @ 2014-06-29 2:57 UTC (permalink / raw To: gentoo-user On Sat, Jun 28, 2014 at 8:46 PM, »Q« <boxcars@gmx.net> wrote: > On Sat, 28 Jun 2014 19:53:08 -0500 > Canek Peláez Valdés <caneko@gmail.com> wrote: > >> On Sat, Jun 28, 2014 at 7:37 PM, <gottlieb@nyu.edu> wrote: >> > On Sat, Jun 28 2014, Canek Peláez Valdés wrote: >> > >> >> That doesn't matter. Take a non-negative integer N; if you flip a >> >> coin an infinite number of times, then the probability of the coin >> >> landing on the same face N times in a row is 1. >> > >> > This is certainly true. >> > >> >> This means that it is *guaranteed* to happen >> > >> > That is not as clear. >> >> Let me be more precise (and please correct me if I'm wrong): It is >> guaranteed to happen at some point in the infinite sequence of random >> flip coins, but we cannot know when it will happen, only that it will >> happen. >> >> That's the way I got it when I took my probability courses, admittedly >> many years ago. > > The probability is 1 in the sense that the as the number of flips M > increases, so does the probability of getting N heads (or tails) in a > row also increases, and the upper bound for the sequence of > probabilities is 1. It's not a probability about something which > actually happens; no one so far has been able to flip a coin an > infinite number of times, not even a computer. And no one will. Ever. >> In any way, even if I'm wrong and it is not guaranteed, the main point >> remains true: the probability of getting a large sequence of the same >> number from a RNG is 1 for every true random RNG, and therefore seeing >> a large sequence of the same number form a RNG doesn't (technically) >> means that it is broken. > > It's true that that wouldn't *prove* the generator is broken. But it > might be a good reason to take another look at the algorithm. Agreed. Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) 2014-06-28 21:28 ` [gentoo-user] " Kai Krakow 2014-06-28 22:06 ` Canek Peláez Valdés @ 2014-06-29 6:24 ` Matti Nykyri 2014-06-29 12:38 ` [gentoo-user] " Kai Krakow 1 sibling, 1 reply; 26+ messages in thread From: Matti Nykyri @ 2014-06-29 6:24 UTC (permalink / raw To: gentoo-user@lists.gentoo.org On Jun 29, 2014, at 0:28, Kai Krakow <hurikhan77@gmail.com> wrote: > > Matti Nykyri <matti.nykyri@iki.fi> schrieb: > >>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote: >>> >>> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb: >>> >>>> If you are looking a mathematically perfect solution there is a simple >>>> one even if your list is not in the power of 2! Take 6 bits at a time of >>>> the random data. If the result is 62 or 63 you will discard the data and >>>> get the next 6 bits. This selectively modifies the random data but keeps >>>> the probabilities in correct balance. Now the probability for index of >>>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. >>> >>> Why not do just something like this? >>> >>> index = 0; >>> while (true) { >>> index = (index + get_6bit_random()) % 62; >>> output << char_array[index]; >>> } >>> >>> Done, no bits wasted. Should have perfect distribution also. We also >>> don't have to throw away random data just to stay within unaligned >>> boundaries. The unalignment is being taken over into the next loop so the >>> "error" corrects itself over time (it becomes distributed over the whole >>> set). >> >> Distribution will not be perfect. The same original problem persists. >> Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. >> Now the addition changes this so that index 0 to 1 reflects to previous >> character and not the original index. >> >> The distribution of like 10GB of data should be quite even but not on a >> small scale. The next char will depend on previous char. It is 100% more >> likely that the next char is the same or one index above the previous char >> then any of the other ones in the series. So it is likely that you will >> have long sets of same character. > > I cannot follow your reasoning here - but I'd like to learn. Actually, I ran > this multiple times and never saw long sets of the same character, even no > short sets of the same character. The 0 or 1 is always rolled over into the > next random addition. I would only get sets of the same character if rand() > returned zero multiple times after each other - which wouldn't be really > random. ;-) In your example that isn't true. You will get the same character if 6bit random number is 0 or if it is 62! This is what makes the flaw! You will also get the next character if random number is 1 or 63. That is why the possibility for 0 and 1 (after modulo 62) is twice as large compared to all other values (2-61). By definition random means that the probability for every value should be the same. So if you have 62 options and even distribution of probability the probability for each of them is 1/62. > Keep in mind: The last index will be reused whenever you'd enter the > function - it won't reset to zero. But still that primitive implementation > had a flaw: It will tend to select characters beyond the current offset, if > it is >= 1/2 into the complete set, otherwise it will prefer selecting > characters before the offset. If you modify the sequence so that if looks random it is pseudo random. > In my tests I counted how ofter new_index > index and new_index < index, and > it had a clear bias for the first. So I added swapping of the selected index > with offset=0 in the set. Now the characters will be swapped and start to > distribute that flaw. The distribution, however, didn't change. Try counting how of often new_index = index and new_index = (index + 1) % 62 and new_index = (index + 2) % 62. With your algorithm the last one should be significantly less then the first two in large sample. > Of course I'm no mathematician, I don't know how I'd calculate the > probabilities for my implementation because it is sort of a recursive > function (for get_rand()) when looking at it over time: > > int get_rand() { > static int index = 0; > return (index = (index + get_6bit_rand()) % 62); > } > > char get_char() { > int index = get_rand(); > char tmp = chars[index]; > chars[index] = chars[0]; > return (chars[0] = tmp); > } > > However, get_char() should return evenly distributes results. > > What this shows, is, that while distribution is even among the result set, > the implementation may still be flawed because results could be predictable > for a subset of results. Or in other words: Simply looking at the > distribution of results is not an indicator for randomness. I could change > get_rand() in the following way: > > int get_rand() { > static int index = 0; > return (index = (index + 1) % 62); > } > > Results would be distributed even, but clearly it is not random. > > -- > Replies to list only preferred. > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* [gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG) 2014-06-29 6:24 ` [gentoo-user] " Matti Nykyri @ 2014-06-29 12:38 ` Kai Krakow 2014-06-30 23:47 ` Matti Nykyri 0 siblings, 1 reply; 26+ messages in thread From: Kai Krakow @ 2014-06-29 12:38 UTC (permalink / raw To: gentoo-user Matti Nykyri <matti.nykyri@iki.fi> schrieb: > On Jun 29, 2014, at 0:28, Kai Krakow <hurikhan77@gmail.com> wrote: >> >> Matti Nykyri <matti.nykyri@iki.fi> schrieb: >> >>>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote: >>>> >>>> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb: >>>> >>>>> If you are looking a mathematically perfect solution there is a simple >>>>> one even if your list is not in the power of 2! Take 6 bits at a time >>>>> of the random data. If the result is 62 or 63 you will discard the >>>>> data and get the next 6 bits. This selectively modifies the random >>>>> data but keeps the probabilities in correct balance. Now the >>>>> probability for index of 0-61 is 1/62 because the probability to get >>>>> 62-63 out of 64 if 0. >>>> >>>> Why not do just something like this? >>>> >>>> index = 0; >>>> while (true) { >>>> index = (index + get_6bit_random()) % 62; >>>> output << char_array[index]; >>>> } >>>> >>>> Done, no bits wasted. Should have perfect distribution also. We also >>>> don't have to throw away random data just to stay within unaligned >>>> boundaries. The unalignment is being taken over into the next loop so >>>> the "error" corrects itself over time (it becomes distributed over the >>>> whole set). >>> >>> Distribution will not be perfect. The same original problem persists. >>> Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be >>> 1/64. Now the addition changes this so that index 0 to 1 reflects to >>> previous character and not the original index. >>> >>> The distribution of like 10GB of data should be quite even but not on a >>> small scale. The next char will depend on previous char. It is 100% more >>> likely that the next char is the same or one index above the previous >>> char then any of the other ones in the series. So it is likely that you >>> will have long sets of same character. >> >> I cannot follow your reasoning here - but I'd like to learn. Actually, I >> ran this multiple times and never saw long sets of the same character, >> even no short sets of the same character. The 0 or 1 is always rolled >> over into the next random addition. I would only get sets of the same >> character if rand() returned zero multiple times after each other - which >> wouldn't be really random. ;-) > > In your example that isn't true. You will get the same character if 6bit > random number is 0 or if it is 62! This is what makes the flaw! > > You will also get the next character if random number is 1 or 63. > > That is why the possibility for 0 and 1 (after modulo 62) is twice as > large compared to all other values (2-61). Ah, now I get it. > By definition random means that the probability for every value should be > the same. So if you have 62 options and even distribution of probability > the probability for each of them is 1/62. Still, the increased probability for single elements should hit different elements each time. So for large sets it will distribute - however, I now get why it's not completely random by definition. >> In my tests I counted how ofter new_index > index and new_index < index, >> and it had a clear bias for the first. So I added swapping of the >> selected index with offset=0 in the set. Now the characters will be >> swapped and start to distribute that flaw. The distribution, however, >> didn't change. > > Try counting how of often new_index = index and new_index = (index + 1) % > 62 and new_index = (index + 2) % 62. With your algorithm the last one > should be significantly less then the first two in large sample. I will try that. It looks like a good approach. -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG) 2014-06-29 12:38 ` [gentoo-user] " Kai Krakow @ 2014-06-30 23:47 ` Matti Nykyri 0 siblings, 0 replies; 26+ messages in thread From: Matti Nykyri @ 2014-06-30 23:47 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1805 bytes --] On Sun, Jun 29, 2014 at 02:38:51PM +0200, Kai Krakow wrote: > Matti Nykyri <matti.nykyri@iki.fi> schrieb: > > > That is why the possibility for 0 and 1 (after modulo 62) is twice as > > large compared to all other values (2-61). > > Ah, now I get it. > > > By definition random means that the probability for every value should be > > the same. So if you have 62 options and even distribution of probability > > the probability for each of them is 1/62. > > Still, the increased probability for single elements should hit different > elements each time. So for large sets it will distribute - however, I now > get why it's not completely random by definition. Usually when you need random data the quality needs to be good! Key, passwords etc. For example if an attacker knows that your random number generator same or the next index with double probability, he will most likely crack each character with half the tries. So for each character in your password the time is split in half. Again 8 character password becomes 2^8 times easier to break compared to truely random data. This is just an example though. > > Try counting how of often new_index = index and new_index = (index + 1) % > > 62 and new_index = (index + 2) % 62. With your algorithm the last one > > should be significantly less then the first two in large sample. > > I will try that. It looks like a good approach. Ok. I wrote a little library that takes random data and mathematically accurately splits it into wanted data. It is attached to the mail. You only need to specify the random source and the maximum number you wish to see in your set. So with 5 you get everything from 0 to 5 (in total of 6 elements). The library takes care of buffering. And most importantly keeps probabilities equal :) -- -Matti [-- Attachment #2: Makefile --] [-- Type: text/plain, Size: 840 bytes --] VERSION=v0.1 prefix=/usr/local CC=$(CROSS_COMPILE)g++ LD=$(CROSS_COMPILE)ld SYS=posix DEF=-DRNG_VERSION=\"$(VERSION)\" OPT=-O2 XCFLAGS=-fPIC -DPIC -march=nocona #XCFLAGS=-fPIC -DPIC -DDEBUG -march=nocona XLDFLAGS=$(XCFLAGS) -Wl,--as-needed -Wl,-O1 -Wl,-soname=librng.so CPPFLAGS=-Wall -std=gnu++98 $(XCFLAGS) $(INC) $(DEF) $(OPT) LDFLAGS=-Wall -shared $(XLDFLAGS) TESTLDFLAGS=-Wall #TESTLDFLAGS=-Wall -lrng bindir=$(prefix)/bin libdir=$(prefix)/lib BINDIR=$(DESTDIR)$(bindir) LIBDIR=$(DESTDIR)$(libdir) SLIBS=$(LIBS) EXT=$(EXT_$(SYS)) LIBS=librng.so all: $(LIBS) rng install: $(LIBS) -mkdir -p $(BINDIR) $(LIBDIR) cp rng$(EXT) $(BINDIR) clean: rm -f *.o *.so rng$(EXT) rng: rng.o $(CC) $(TESTLDFLAGS) -o $@$(EXT) $@.o librng.o rng.o: rng.cpp librng.so: librng.o $(CC) $(LDFLAGS) -o $@$(EXT) librng.o librng.o: librng.cpp [-- Attachment #3: librng.h --] [-- Type: text/x-chdr, Size: 1344 bytes --] //#define BUFFER_SIZE 4096 //64 bits is 8 bytes: number of uint64_t in buffer //#define NUM_SETS (4096 / 8) //#define NUM_BITS 64 #include <inttypes.h> struct BinaryData { uint64_t data; int8_t bits; }; class BitContainer { public: BitContainer(); ~BitContainer(); bool has(int8_t bits); uint64_t get(int8_t bits); int8_t set(uint64_t data, int8_t bits); void fill(uint64_t *data); static void cpy(struct BinaryData *dest, struct BinaryData *src, int8_t bits); private: void xfer(); static void added(int8_t &stored, int8_t bits); struct BinaryData pri; struct BinaryData sec; }; class Rng { public: Rng(char* device, uint64_t max); ~Rng(); const uint64_t setMax(const uint64_t max); uint64_t getMax(); int setDevice(const char* device); uint64_t getRnd(); static uint64_t getMask(int8_t bits); static int8_t calculateBits(uint64_t level); private: void fillBuffer(); void readBuffer(); void getBits(uint64_t *data, int8_t *avail, uint64_t *out); void saveBits(uint64_t save); void processBits(uint64_t max, uint64_t level, uint64_t data); void error(const char* str); int iRndFD; size_t lCursor; size_t lBuffer; uint64_t* pStart; uint64_t* pNext; uint64_t* pEnd; BitContainer sRnd; uint64_t lMax; uint64_t lOutMask; int8_t cOutBits; }; [-- Attachment #4: librng.cpp --] [-- Type: text/x-c++src, Size: 7171 bytes --] #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #include "librng.h" #ifdef DEBUG #include <stdio.h> #include <stdlib.h> long* results = 0; long* results2 = 0; unsigned long dMax = 0; int pushed[64]; long readData = 0; long readBuff = 0; long readBits = 0; long validBits = 0; long bitsPushed = 0; long readExtra = 0; int bits = 0; unsigned long totalBits = 0; unsigned long used = 0; unsigned long wasted = 0; unsigned long power(int exp) { unsigned long x = 1; for (int i = 0; i < exp; i++) x *= 2; return x; } void dump_results() { fprintf(stderr, "Rounds for each number:\n"); for (unsigned long i = 0; i < dMax; i++) fprintf(stderr, "%li = %li\t", i, results[i]); fprintf(stderr, "\n"); fprintf(stderr, "Rounds for each initial number:\n"); for (unsigned long i = 0; i < power(bits); i++) fprintf(stderr, "%li = %li\t", i, results2[i]); fprintf(stderr, "\n"); fprintf(stderr, "Rounds for extra bits: total pushed: \t%li\n", bitsPushed); for (int i = 0; i < 64; i++) fprintf(stderr, "%i = %i\t", i, pushed[i]); fprintf(stderr, "\n"); fprintf(stderr, "Device reads = %li\n", readData); fprintf(stderr, "Buffer reads = %li\n", readBuff); fprintf(stderr, "64bit reads = %li\n", readBits); fprintf(stderr, "Valid results = %li\n", validBits); fprintf(stderr, "Extra bit reads = %li\n", readExtra); fprintf(stderr, "Num bits per sample = %i\n", bits); fprintf(stderr, "Pagesize = %li\n", sysconf(_SC_PAGE_SIZE)); fprintf(stderr, "Maximum value = %lu\n", dMax); fprintf(stderr, "Total bits read = %lu\n", totalBits); fprintf(stderr, "Bits output = %li\n", used); fprintf(stderr, "Bits wasted = %li\n", wasted); //fprintf(stderr, " = %li\n"); } #endif Rng::Rng(char* device, uint64_t max) { iRndFD = -1; lCursor = 0; lBuffer = 0; pStart = 0; pNext = 0; pEnd = 0; long lPagesize = sysconf(_SC_PAGE_SIZE); if (lPagesize % sizeof(uint64_t) != 0) error("Pagesize not multiple of uint64_t"); pStart = (uint64_t*)mmap(NULL, lPagesize, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_NORESERVE, -1 ,0); if (pStart == MAP_FAILED) error("mmap"); lBuffer = lPagesize; lCursor = 0; pNext = pEnd = pStart; setMax(max); setDevice(device); } Rng::~Rng() { if (iRndFD != -1) close(iRndFD); iRndFD = -1; munmap(pStart, pEnd - pStart); pStart = pEnd = pNext = 0; #ifdef DEBUG dump_results(); if (results) free(results); if (results2) free(results2); #endif } uint64_t Rng::getMax() { return lMax; } const uint64_t Rng::setMax(const uint64_t max) { lMax = max; lOutMask = 0; cOutBits = 0; while (lMax > lOutMask) { lOutMask = (lOutMask << 1) + 1; cOutBits++; } #ifdef DEBUG if (results) free(results); if (results2) free(results2); results = (long *)malloc((lMax + 1) * sizeof(long)); results2 = (long *)malloc((power(cOutBits)) * sizeof(long)); dMax = lMax + 1; bits = cOutBits; #endif return lMax; } int Rng::setDevice(const char* device) { if (iRndFD != -1) close(iRndFD); iRndFD = open(device, O_RDONLY); if (iRndFD == -1) error("Open failed"); return iRndFD; } uint64_t Rng::getRnd() { uint64_t lOut = 0; bool valid = 0; if (iRndFD == -1 || pStart == 0) { error("MMAP or Open failed"); return lOut; } if (lMax == 0) { error("Maximun set to zero"); return lOut; } do { if (sRnd.has(cOutBits)) { lOut = sRnd.get(cOutBits); #ifdef DEBUG wasted += cOutBits; readBits++; #endif if (lOut <= lMax) valid = 1; else saveBits(lOut); } else readBuffer(); } while (!valid); #ifdef DEBUG results[lOut]++; validBits++; used += cOutBits; wasted -= cOutBits; #endif return lOut; } void Rng::saveBits(uint64_t save) { uint64_t level = 1; uint64_t max = lOutMask; max -= lMax; save -= lMax; save --; do { level <<= 1; } while (level <= max); level >>= 1; processBits(max, level, save); } inline void Rng::processBits(uint64_t max, uint64_t level, uint64_t data) { if (data < level) { sRnd.set(data, calculateBits(level)); #ifdef DEBUG pushed[calculateBits(level)]++; bitsPushed++; wasted -= calculateBits(level); #endif } else { max -= level; data -= level; do { level >>= 1; } while (level > max); if (level > 1) processBits(max, level, data); #ifdef DEBUG else { pushed[0]++; } #endif } } void Rng::readBuffer() { if (pNext == pEnd) fillBuffer(); sRnd.fill(pNext++); #ifdef DEBUG readBuff++; #endif } void Rng::fillBuffer() { size_t count = 0; ssize_t bytes = 0; uint8_t* start = (uint8_t *)pStart; if (lCursor == lBuffer) { pNext = pEnd = pStart; lCursor = 0; } do { bytes = read(iRndFD, start + lCursor, lBuffer - lCursor); if (bytes == -1) error("Read interrupted"); else { lCursor += bytes; count += bytes; } } while (count < sizeof(uint64_t) && lBuffer != lCursor); pEnd = pStart + (lCursor / sizeof(uint64_t)); #ifdef DEBUG totalBits += count * 8; readData++; #endif } inline void Rng::error (const char* str) { #ifdef DEBUG fprintf(stderr, "RNG error: %s\n", str); #endif } BitContainer::BitContainer() { pri = (struct BinaryData) {0, 0}; sec = pri; } BitContainer::~BitContainer() { } bool BitContainer::has(int8_t bits) { if (pri.bits >= bits) return true; if (sec.bits) xfer(); if (pri.bits >= bits) return true; return false; } uint64_t Rng::getMask(int8_t bits) { uint64_t mask = 0; mask = ~mask; //64bit shift at once is undefined: mask <<= (bits - 1); mask <<= 1; mask = ~mask; return mask; } int8_t Rng::calculateBits(uint64_t level) { int8_t bits = 0; while (level > 1) { level >>= 1; bits++; } return bits; } uint64_t BitContainer::get(int8_t bits) { uint64_t data = 0; uint64_t mask = Rng::getMask(bits); data = pri.data & mask; pri.data >>= bits; pri.bits -= bits; return data; } void BitContainer::xfer() { int8_t bits = 0; int8_t free = (sizeof(uint64_t) * 8) - pri.bits; bits = free < sec.bits ? free : sec.bits; cpy(&pri, &sec, bits); } void BitContainer::cpy(struct BinaryData *dest, struct BinaryData *src, int8_t bits) { dest->data <<= bits; dest->data |= (src->data & Rng::getMask(bits)); src->data >>= bits; src->bits -= bits; added(dest->bits, bits); } void BitContainer::added(int8_t &stored, int8_t bits) { stored += bits; if (stored > ((int8_t)sizeof(uint64_t) * 8)) stored = sizeof(uint64_t) * 8; } int8_t BitContainer::set(uint64_t data, int8_t bits) { struct BinaryData src = (struct BinaryData) {data, bits}; int8_t free = (sizeof(uint64_t) * 8) - sec.bits; free = free < bits ? free : bits; cpy(&sec, &src, free); return free; } void BitContainer::fill(uint64_t *data) { sec = pri; pri.data = *data; pri.bits = sizeof(uint64_t) * 8; } [-- Attachment #5: rng.cpp --] [-- Type: text/x-c++src, Size: 433 bytes --] #include <stdlib.h> #include <stdio.h> #include "librng.h" int main(const int ARGC, char* const ARGV[], char* const ENV[]) { int rounds = 0; unsigned long max = 0; unsigned long result = 0; rounds = atoi(ARGV[2]); max = strtoul(ARGV[3], NULL, 0); Rng rand(ARGV[1], max); for (int i = 0; i < rounds; i++) { result = rand.getRnd(); (void) result; printf("rnd = %lu\n", result); } return 0; } ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2014-06-30 23:48 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-06 2:56 [gentoo-user] OT: Mapping random numbers (PRNG) meino.cramer 2014-06-06 3:58 ` Canek Peláez Valdés 2014-06-06 18:39 ` meino.cramer 2014-06-06 19:37 ` null_ptr 2014-06-06 19:55 ` Canek Peláez Valdés 2014-06-06 21:05 ` Matti Nykyri 2014-06-06 21:03 ` Matti Nykyri 2014-06-07 9:19 ` Matti Nykyri 2014-06-07 12:13 ` Marc Joliet 2014-06-26 21:00 ` [gentoo-user] " Kai Krakow 2014-06-26 22:07 ` Kai Krakow 2014-06-27 8:55 ` thegeezer 2014-06-27 17:49 ` Matti Nykyri 2014-06-27 17:50 ` [gentoo-user] " Kai Krakow 2014-06-27 20:59 ` Neil Bothwick 2014-06-27 21:13 ` [gentoo-user] " Matti Nykyri 2014-06-28 13:08 ` Matti Nykyri 2014-06-28 21:28 ` [gentoo-user] " Kai Krakow 2014-06-28 22:06 ` Canek Peláez Valdés 2014-06-29 0:37 ` gottlieb 2014-06-29 0:53 ` Canek Peláez Valdés 2014-06-29 1:46 ` [gentoo-user] " »Q« 2014-06-29 2:57 ` Canek Peláez Valdés 2014-06-29 6:24 ` [gentoo-user] " Matti Nykyri 2014-06-29 12:38 ` [gentoo-user] " Kai Krakow 2014-06-30 23:47 ` Matti Nykyri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox