public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] OT: Mapping random numbers (PRNG)
@ 2014-06-06  2:56 meino.cramer
  2014-06-06  3:58 ` Canek Peláez Valdés
  0 siblings, 1 reply; 26+ messages in thread
From: meino.cramer @ 2014-06-06  2:56 UTC (permalink / raw
  To: Gentoo

Hi,

I am experimenting with the C code of the ISAAC pseudo random number generator
(http://burtleburtle.net/bob/rand/isaacafa.html).

Currently the implementation creates (on my embedded linux) 32 bit
hexadecimal output.

From this I want to create random numbers in the range of [a-Za-z0-9]
*without violating randomness* and (if possible) without throwing 
away bits of the output.

How can I do this mathemtically (in concern of the quality of output) 
correct?

Thank you very much in advance for any help!
Best regards,
mcc








^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] OT: Mapping random numbers (PRNG)
  2014-06-06  2:56 [gentoo-user] OT: Mapping random numbers (PRNG) meino.cramer
@ 2014-06-06  3:58 ` Canek Peláez Valdés
  2014-06-06 18:39   ` meino.cramer
  2014-06-06 21:03   ` Matti Nykyri
  0 siblings, 2 replies; 26+ messages in thread
From: Canek Peláez Valdés @ 2014-06-06  3:58 UTC (permalink / raw
  To: gentoo-user

On Thu, Jun 5, 2014 at 9:56 PM,  <meino.cramer@gmx.de> wrote:
> Hi,
>
> I am experimenting with the C code of the ISAAC pseudo random number generator
> (http://burtleburtle.net/bob/rand/isaacafa.html).
>
> Currently the implementation creates (on my embedded linux) 32 bit
> hexadecimal output.

So it's a 32 bit integer.

> From this I want to create random numbers in the range of [a-Za-z0-9]
> *without violating randomness* and (if possible) without throwing
> away bits of the output.

You mean *characters* int the range [A-Za-z0-9]?

> How can I do this mathemtically (in concern of the quality of output)
> correct?

The easiest thing to do would be:

-------------------------------------------------------------------------------
#include <time.h>
#include <stdio.h>
#include <stdlib.h>

#define N (26+26+10)

static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M',
                    'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
'X', 'Y', 'Z',
                    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm',
                    'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
'x', 'y', 'z',
                    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };

int
next_character()
{
        // Use the correct call for ISAAC instead of rand()
        unsigned int idx = rand() % N;
        return S[idx];
}

int
main(int argc, char* argv[])
{
        // Use the correct call for initializing the ISAAC seed
        srand((unsigned int)time(NULL));
        for (int i = 0; i < 20; i++) // --std=c99
                printf("%c\n", next_character());
        return 0;
}
-------------------------------------------------------------------------------

If the ISAAC RNG has a good distribution, then the next_character()
function will give a good distribution among the set [A-Za-z0-9].

Unless I missunderstood what you meant with "create random numbers in
the range of [a-Za-z0-9]".

Regards.
-- 
Canek Peláez Valdés
Profesor de asignatura, Facultad de Ciencias
Universidad Nacional Autónoma de México


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] OT: Mapping random numbers (PRNG)
  2014-06-06  3:58 ` Canek Peláez Valdés
@ 2014-06-06 18:39   ` meino.cramer
  2014-06-06 19:37     ` null_ptr
                       ` (2 more replies)
  2014-06-06 21:03   ` Matti Nykyri
  1 sibling, 3 replies; 26+ messages in thread
From: meino.cramer @ 2014-06-06 18:39 UTC (permalink / raw
  To: gentoo-user

Canek Peláez Valdés <caneko@gmail.com> [14-06-06 17:36]:
> On Thu, Jun 5, 2014 at 9:56 PM,  <meino.cramer@gmx.de> wrote:
> > Hi,
> >
> > I am experimenting with the C code of the ISAAC pseudo random number generator
> > (http://burtleburtle.net/bob/rand/isaacafa.html).
> >
> > Currently the implementation creates (on my embedded linux) 32 bit
> > hexadecimal output.
> 
> So it's a 32 bit integer.
> 
> > From this I want to create random numbers in the range of [a-Za-z0-9]
> > *without violating randomness* and (if possible) without throwing
> > away bits of the output.
> 
> You mean *characters* int the range [A-Za-z0-9]?
> 
> > How can I do this mathemtically (in concern of the quality of output)
> > correct?
> 
> The easiest thing to do would be:
> 
> -------------------------------------------------------------------------------
> #include <time.h>
> #include <stdio.h>
> #include <stdlib.h>
> 
> #define N (26+26+10)
> 
> static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
> 'K', 'L', 'M',
>                     'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
> 'X', 'Y', 'Z',
>                     'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
> 'k', 'l', 'm',
>                     'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
> 'x', 'y', 'z',
>                     '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
> 
> int
> next_character()
> {
>         // Use the correct call for ISAAC instead of rand()
>         unsigned int idx = rand() % N;
>         return S[idx];
> }
> 
> int
> main(int argc, char* argv[])
> {
>         // Use the correct call for initializing the ISAAC seed
>         srand((unsigned int)time(NULL));
>         for (int i = 0; i < 20; i++) // --std=c99
>                 printf("%c\n", next_character());
>         return 0;
> }
> -------------------------------------------------------------------------------
> 
> If the ISAAC RNG has a good distribution, then the next_character()
> function will give a good distribution among the set [A-Za-z0-9].
> 
> Unless I missunderstood what you meant with "create random numbers in
> the range of [a-Za-z0-9]".
> 
> Regards.
> -- 
> Canek Peláez Valdés
> Profesor de asignatura, Facultad de Ciencias
> Universidad Nacional Autónoma de México
> 

Hi,

Thank you very much for the input! :)

I have a question about the algorithm:
Suppose rand() has an equal distribution of numbers and furthermore
one has a count of 2^32 random numbers listed in numerical sort
order.
In this list each number would appear (nearly) with the same count: 1

To get an better imagination of that...suppose the rand() would only 
return numbers in the range of 1...12 and the alphabet has only 8
characters (as 2^32 is not devideable by 62)

rand():
1 2 3 4 5 6 7 8 9 10 11 12

rand()%N : rand()%7
1 2 3 4 5 6 7 0 1  2  3  4  

or in other words: An even distribution of numbers of rand() 
would result in a unevenly distributed sequence of characters...or?
This would break the quality of ISAACs output.

I am sure I did something wrong here...but where is the logic trap?


Thank you very much for any help in advance!
Best regards,
mcc









^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] OT: Mapping random numbers (PRNG)
  2014-06-06 18:39   ` meino.cramer
@ 2014-06-06 19:37     ` null_ptr
  2014-06-06 19:55     ` Canek Peláez Valdés
  2014-06-06 21:05     ` Matti Nykyri
  2 siblings, 0 replies; 26+ messages in thread
From: null_ptr @ 2014-06-06 19:37 UTC (permalink / raw
  To: gentoo-user

On 06/06/14 20:39, meino.cramer@gmx.de wrote:
>To get an better imagination of that...suppose the rand() would only
>return numbers in the range of 1...12 and the alphabet has only 8
>characters (as 2^32 is not devideable by 62)
>
>rand():
>1 2 3 4 5 6 7 8 9 10 11 12
>
>rand()%N : rand()%7
>1 2 3 4 5 6 7 0 1  2  3  4
>
>or in other words: An even distribution of numbers of rand()
>would result in a unevenly distributed sequence of characters...or?
>This would break the quality of ISAACs output.
>
>I am sure I did something wrong here...but where is the logic trap?

You thought on that is totally right.
Let's say random numbers have the rande [0..RAND_MAX]
Blindly calculating modulus
creates a bias towards lower numbers if RAND_MAX+1 is not divisible
by N. For most applications this bias is negligible. If you really
want the same distribution you have to discard numbers greater than
RAND_MAX - ((RAND_MAX+1)%N)
(Beware the undefined behaviour in the above line if you try C.)
Basically you want one less than the largest by N dividable number
in your range.

With numbers:
rand():
0 1 2 3 4 5 6 7 8 9 10 11 12
rand()%5 with discarding >12-((12+1)%5)=9:
0 1 2 3 4 0 1 2 3 4 discarded

---
null_ptr
Please don't follow me.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] OT: Mapping random numbers (PRNG)
  2014-06-06 18:39   ` meino.cramer
  2014-06-06 19:37     ` null_ptr
@ 2014-06-06 19:55     ` Canek Peláez Valdés
  2014-06-06 21:05     ` Matti Nykyri
  2 siblings, 0 replies; 26+ messages in thread
From: Canek Peláez Valdés @ 2014-06-06 19:55 UTC (permalink / raw
  To: gentoo-user

On Fri, Jun 6, 2014 at 1:39 PM,  <meino.cramer@gmx.de> wrote:
> Canek Peláez Valdés <caneko@gmail.com> [14-06-06 17:36]:
>> On Thu, Jun 5, 2014 at 9:56 PM,  <meino.cramer@gmx.de> wrote:
>> > Hi,
>> >
>> > I am experimenting with the C code of the ISAAC pseudo random number generator
>> > (http://burtleburtle.net/bob/rand/isaacafa.html).
>> >
>> > Currently the implementation creates (on my embedded linux) 32 bit
>> > hexadecimal output.
>>
>> So it's a 32 bit integer.
>>
>> > From this I want to create random numbers in the range of [a-Za-z0-9]
>> > *without violating randomness* and (if possible) without throwing
>> > away bits of the output.
>>
>> You mean *characters* int the range [A-Za-z0-9]?
>>
>> > How can I do this mathemtically (in concern of the quality of output)
>> > correct?
>>
>> The easiest thing to do would be:
>>
>> -------------------------------------------------------------------------------
>> #include <time.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> #define N (26+26+10)
>>
>> static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
>> 'K', 'L', 'M',
>>                     'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
>> 'X', 'Y', 'Z',
>>                     'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
>> 'k', 'l', 'm',
>>                     'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
>> 'x', 'y', 'z',
>>                     '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
>>
>> int
>> next_character()
>> {
>>         // Use the correct call for ISAAC instead of rand()
>>         unsigned int idx = rand() % N;
>>         return S[idx];
>> }
>>
>> int
>> main(int argc, char* argv[])
>> {
>>         // Use the correct call for initializing the ISAAC seed
>>         srand((unsigned int)time(NULL));
>>         for (int i = 0; i < 20; i++) // --std=c99
>>                 printf("%c\n", next_character());
>>         return 0;
>> }
>> -------------------------------------------------------------------------------
>>
>> If the ISAAC RNG has a good distribution, then the next_character()
>> function will give a good distribution among the set [A-Za-z0-9].
>>
>> Unless I missunderstood what you meant with "create random numbers in
>> the range of [a-Za-z0-9]".
>>
>> Regards.
>> --
>> Canek Peláez Valdés
>> Profesor de asignatura, Facultad de Ciencias
>> Universidad Nacional Autónoma de México
>>
>
> Hi,
>
> Thank you very much for the input! :)
>
> I have a question about the algorithm:
> Suppose rand() has an equal distribution of numbers and furthermore
> one has a count of 2^32 random numbers listed in numerical sort
> order.
> In this list each number would appear (nearly) with the same count: 1
>
> To get an better imagination of that...suppose the rand() would only
> return numbers in the range of 1...12 and the alphabet has only 8
> characters (as 2^32 is not devideable by 62)
>
> rand():
> 1 2 3 4 5 6 7 8 9 10 11 12
>
> rand()%N : rand()%7
> 1 2 3 4 5 6 7 0 1  2  3  4
>
> or in other words: An even distribution of numbers of rand()
> would result in a unevenly distributed sequence of characters...or?
> This would break the quality of ISAACs output.
>
> I am sure I did something wrong here...but where is the logic trap?

In theory, it doesn't matter if the number of random characters you
want is not multiple of the size of the set of characters. In
practice, it doesn't matter if the number of random characters you
want is big enough.

Consider the following modification to my program (I changed the order
of the array S so it's sorted and I can do binary search):

----------------------------------------------------------------------
#include <time.h>
#include <stdio.h>
#include <stdlib.h>

#define N (10+26+26)

static char S[] = {
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
        'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
};

static int
index_of_aux(char c, int a, int b)
{
        if (a > b)
                return -1;
        int m = (a + b) / 2;
        if (S[m] == c)
                return m;
        if (S[m] > c)
                return index_of_aux(c, a, m-1);
        return index_of_aux(c, m+1, b);
}

static int
index_of(char c)
{
        return index_of_aux(c, 0, N-1);
}

int
next_character()
{
        // Use the correct call for isaac instead of rand()
        unsigned int idx = rand() % N;
        return S[idx];
}

int
main(int argc, char* argv[])
{
        srand((unsigned int)time(NULL));
        int total = 1000000;

        // Size = 62
        int count[] = {
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
        };

        for (int i = 0; i < total; i++) {
                char c = next_character();
                count[index_of(c)]++;
        }

        int min = total, max = 0;
        for (int i = 0; i < N; i++) {
                printf("Count %d: %d\n", i, count[i]);
                min = (count[i] < min) ? count[i] : min;
                max = (count[i] > max) ? count[i] : max;
        }
        printf("Min: %d\n", min);
        printf("Max: %d\n", max);
        printf("Diff: %d\n", (max-min));
        return 0;
}
----------------------------------------------------------------------

The output is the following:

Count 0: 16060
Count 1: 16102
Count 2: 15968
Count 3: 16075
...
Count 59: 16220
Count 60: 16143
Count 61: 16177
Min: 15819
Max: 16366
Diff: 547

As you can see, all the characters appear  around 16,000 times, so
it's pretty well distributed. If the RNG has a good distribution, it
will *always* look something like this. The difference will *never* be
0, since no RNG is perfectly distributed.

So, TL;DR: it doesn't matter if total is not multiple of N.

Regards.
-- 
Canek Peláez Valdés
Profesor de asignatura, Facultad de Ciencias
Universidad Nacional Autónoma de México


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] OT: Mapping random numbers (PRNG)
  2014-06-06  3:58 ` Canek Peláez Valdés
  2014-06-06 18:39   ` meino.cramer
@ 2014-06-06 21:03   ` Matti Nykyri
  2014-06-07  9:19     ` Matti Nykyri
  2014-06-26 21:00     ` [gentoo-user] " Kai Krakow
  1 sibling, 2 replies; 26+ messages in thread
From: Matti Nykyri @ 2014-06-06 21:03 UTC (permalink / raw
  To: gentoo-user

On Thu, Jun 05, 2014 at 10:58:51PM -0500, Canek Peláez Valdés wrote:
> On Thu, Jun 5, 2014 at 9:56 PM,  <meino.cramer@gmx.de> wrote:
> > Hi,
> >
> > I am experimenting with the C code of the ISAAC pseudo random number generator
> > (http://burtleburtle.net/bob/rand/isaacafa.html).
> >
> > Currently the implementation creates (on my embedded linux) 32 bit
> > hexadecimal output.
> 
> So it's a 32 bit integer.
> 
> > From this I want to create random numbers in the range of [a-Za-z0-9]
> > *without violating randomness* and (if possible) without throwing
> > away bits of the output.
> 
> You mean *characters* int the range [A-Za-z0-9]?

Well this isn't as simple problem as it sounds. A random 32 bit integer 
has 32 bits of randomness. If you take a divison reminder of 62 from this 
integer you will get only 5,95419631039 bits of randomness 
(log(62)/log(2)). So you are wasting 81,4% of your random data. Which is 
quite much and usually random data is quite expensive. You can save your 
precious random data by taking only 6 bit from your 32 bit integer and 
dividing it by 62. Then you will be wasting only 0,8% of random data. 
Another problem is alignment, but that is about mathematical correctness.

> > How can I do this mathemtically (in concern of the quality of output)
> > correct?
> 
> The easiest thing to do would be:

The easiest is not mathematically correct though. Random data will stay 
random only if you select and modify it so that randomness is preserved. 
If you take devison reminder of 62 from 32 bit integer there are 69 273 
667 possibilities of the reminder to be 3 or less. For the reminder to 4 
or more the number of possibilities is 69 273 666. In mathematically 
ideal case the probability for every index of the list should be same: 
1/62 = 1,61290322581%. But the modulo 62 modifies this probability: for 
index 0-3 the probability is 69 273 667/2^32 = 1,61290324759%. And for 
indexes 4-61 the probability will be 69 273 666/2^32 = 1,6129032243%.

If you wish not to waste those random bits the probabilities will get 
worse. With 6 bits of random the probability for index 0-1 will be 2/64 
and for 2-63 it will be 1/64. This is a very significant change because 
first and second index will appear twice as much as the rest. If you add 
2 characters to your list you will perfect alignment and you can take 6 
bits of data without it modifying probabilities.

If you are looking a mathematically perfect solution there is a simple 
one even if your list is not in the power of 2! Take 6 bits at a time of 
the random data. If the result is 62 or 63 you will discard the data and 
get the next 6 bits. This selectively modifies the random data but keeps 
the probabilities in correct balance. Now the probability for index of 
0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.

> -------------------------------------------------------------------------------
> #include <time.h>
> #include <stdio.h>
> #include <stdlib.h>
> 
> #define N (26+26+10)
> 
> static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
> 'K', 'L', 'M',
>                     'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
> 'X', 'Y', 'Z',
>                     'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
> 'k', 'l', 'm',
>                     'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
> 'x', 'y', 'z',
>                     '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
> 
> int
> next_character()
> {
>         // Use the correct call for ISAAC instead of rand()
>         unsigned int idx = rand() % N;
>         return S[idx];
> }

so modify the next_char function:

char next_character()
{
	static unsigned int rand = 0; //(sizeof(int) = 32)
	static char bit_avail = 0;
	char result = 0;
	char move_bits = 0;
	char bits_moved = 0;

	do {
		if (!bits_avail) {
        		// Use the correct call for ISAAC instead of rand()
			rand = rand();
			
			bit_avail = 32;
		}

		move_bits = bits_avail >= 6 ? 6 : bits_avail;
		result <<= move_bits;
		result = (result | rand & (0xFF >> (8 - move_bits))) & 0x3F;
		bits_avail -= move_bits;
		bits_moved += move_bits;
		rand >>= move_bits;

	} while (bits_moved != 6 && result > 61);

	return result;
}

This function will give perfect distribution of 1/62 probability for 
every index. It will waste 6 bits with the probability of 1/32 (2/64).

> int
> main(int argc, char* argv[])
> {
>         // Use the correct call for initializing the ISAAC seed
>         srand((unsigned int)time(NULL));
>         for (int i = 0; i < 20; i++) // --std=c99
>                 printf("%c\n", next_character());
>         return 0;
> }
> -------------------------------------------------------------------------------
> 
> If the ISAAC RNG has a good distribution, then the next_character()
> function will give a good distribution among the set [A-Za-z0-9].
> 
> Unless I missunderstood what you meant with "create random numbers in
> the range of [a-Za-z0-9]".

-- 
-Matti


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] OT: Mapping random numbers (PRNG)
  2014-06-06 18:39   ` meino.cramer
  2014-06-06 19:37     ` null_ptr
  2014-06-06 19:55     ` Canek Peláez Valdés
@ 2014-06-06 21:05     ` Matti Nykyri
  2 siblings, 0 replies; 26+ messages in thread
From: Matti Nykyri @ 2014-06-06 21:05 UTC (permalink / raw
  To: gentoo-user

On Fri, Jun 06, 2014 at 08:39:28PM +0200, meino.cramer@gmx.de wrote:
> Canek Peláez Valdés <caneko@gmail.com> [14-06-06 17:36]:
> > On Thu, Jun 5, 2014 at 9:56 PM,  <meino.cramer@gmx.de> wrote:
> > > Hi,
> > >
> > > I am experimenting with the C code of the ISAAC pseudo random number generator
> > > (http://burtleburtle.net/bob/rand/isaacafa.html).
> > >
> > > Currently the implementation creates (on my embedded linux) 32 bit
> > > hexadecimal output.
> > 
> > So it's a 32 bit integer.
> > 
> > > From this I want to create random numbers in the range of [a-Za-z0-9]
> > > *without violating randomness* and (if possible) without throwing
> > > away bits of the output.
> > 
> > You mean *characters* int the range [A-Za-z0-9]?
> > 
> > > How can I do this mathemtically (in concern of the quality of output)
> > > correct?
> > 
> > The easiest thing to do would be:
> > 
> > -------------------------------------------------------------------------------
> > #include <time.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > 
> > #define N (26+26+10)
> > 
> > static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
> > 'K', 'L', 'M',
> >                     'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
> > 'X', 'Y', 'Z',
> >                     'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
> > 'k', 'l', 'm',
> >                     'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
> > 'x', 'y', 'z',
> >                     '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
> > 
> > int
> > next_character()
> > {
> >         // Use the correct call for ISAAC instead of rand()
> >         unsigned int idx = rand() % N;
> >         return S[idx];
> > }
> > 
> > int
> > main(int argc, char* argv[])
> > {
> >         // Use the correct call for initializing the ISAAC seed
> >         srand((unsigned int)time(NULL));
> >         for (int i = 0; i < 20; i++) // --std=c99
> >                 printf("%c\n", next_character());
> >         return 0;
> > }
> > -------------------------------------------------------------------------------
> > 
> > If the ISAAC RNG has a good distribution, then the next_character()
> > function will give a good distribution among the set [A-Za-z0-9].
> > 
> > Unless I missunderstood what you meant with "create random numbers in
> > the range of [a-Za-z0-9]".
> > 
> > Regards.
> > -- 
> > Canek Peláez Valdés
> > Profesor de asignatura, Facultad de Ciencias
> > Universidad Nacional Autónoma de México
> > 
> 
> Hi,
> 
> Thank you very much for the input! :)
> 
> I have a question about the algorithm:
> Suppose rand() has an equal distribution of numbers and furthermore
> one has a count of 2^32 random numbers listed in numerical sort
> order.
> In this list each number would appear (nearly) with the same count: 1
> 
> To get an better imagination of that...suppose the rand() would only 
> return numbers in the range of 1...12 and the alphabet has only 8
> characters (as 2^32 is not devideable by 62)
> 
> rand():
> 1 2 3 4 5 6 7 8 9 10 11 12
> 
> rand()%N : rand()%7
> 1 2 3 4 5 6 7 0 1  2  3  4  
> 
> or in other words: An even distribution of numbers of rand() 
> would result in a unevenly distributed sequence of characters...or?
> This would break the quality of ISAACs output.
> 
> I am sure I did something wrong here...but where is the logic trap?
> 

This is the thing I explained in my message.

-- 
-Matti


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] OT: Mapping random numbers (PRNG)
  2014-06-06 21:03   ` Matti Nykyri
@ 2014-06-07  9:19     ` Matti Nykyri
  2014-06-07 12:13       ` Marc Joliet
  2014-06-26 21:00     ` [gentoo-user] " Kai Krakow
  1 sibling, 1 reply; 26+ messages in thread
From: Matti Nykyri @ 2014-06-07  9:19 UTC (permalink / raw
  To: gentoo-user

On Sat, Jun 07, 2014 at 12:03:29AM +0300, Matti Nykyri wrote:
> On Thu, Jun 05, 2014 at 10:58:51PM -0500, Canek Peláez Valdés wrote:
> > On Thu, Jun 5, 2014 at 9:56 PM,  <meino.cramer@gmx.de> wrote:
> > > Hi,
> > >
> > > I am experimenting with the C code of the ISAAC pseudo random number generator
> > > (http://burtleburtle.net/bob/rand/isaacafa.html).
> > >
> > > Currently the implementation creates (on my embedded linux) 32 bit
> > > hexadecimal output.
> > 
> > So it's a 32 bit integer.
> > 
> > > From this I want to create random numbers in the range of [a-Za-z0-9]
> > > *without violating randomness* and (if possible) without throwing
> > > away bits of the output.
> > 
> > You mean *characters* int the range [A-Za-z0-9]?
> 
> Well this isn't as simple problem as it sounds. A random 32 bit integer 
> has 32 bits of randomness. If you take a divison reminder of 62 from this 
> integer you will get only 5,95419631039 bits of randomness 
> (log(62)/log(2)). So you are wasting 81,4% of your random data. Which is 
> quite much and usually random data is quite expensive. You can save your 
> precious random data by taking only 6 bit from your 32 bit integer and 
> dividing it by 62. Then you will be wasting only 0,8% of random data. 
> Another problem is alignment, but that is about mathematical correctness.
> 
> > > How can I do this mathemtically (in concern of the quality of output)
> > > correct?
> > 
> > The easiest thing to do would be:
> 
> The easiest is not mathematically correct though. Random data will stay 
> random only if you select and modify it so that randomness is preserved. 
> If you take devison reminder of 62 from 32 bit integer there are 69 273 
> 667 possibilities of the reminder to be 3 or less. For the reminder to 4 
> or more the number of possibilities is 69 273 666. In mathematically 
> ideal case the probability for every index of the list should be same: 
> 1/62 = 1,61290322581%. But the modulo 62 modifies this probability: for 
> index 0-3 the probability is 69 273 667/2^32 = 1,61290324759%. And for 
> indexes 4-61 the probability will be 69 273 666/2^32 = 1,6129032243%.
> 
> If you wish not to waste those random bits the probabilities will get 
> worse. With 6 bits of random the probability for index 0-1 will be 2/64 
> and for 2-63 it will be 1/64. This is a very significant change because 
> first and second index will appear twice as much as the rest. If you add 
> 2 characters to your list you will perfect alignment and you can take 6 
> bits of data without it modifying probabilities.
> 
> If you are looking a mathematically perfect solution there is a simple 
> one even if your list is not in the power of 2! Take 6 bits at a time of 
> the random data. If the result is 62 or 63 you will discard the data and 
> get the next 6 bits. This selectively modifies the random data but keeps 
> the probabilities in correct balance. Now the probability for index of 
> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
> 
> > -------------------------------------------------------------------------------
> > #include <time.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > 
> > #define N (26+26+10)
> > 
> > static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
> > 'K', 'L', 'M',
> >                     'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
> > 'X', 'Y', 'Z',
> >                     'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
> > 'k', 'l', 'm',
> >                     'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
> > 'x', 'y', 'z',
> >                     '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
> > 
> > int
> > next_character()
> > {
> >         // Use the correct call for ISAAC instead of rand()
> >         unsigned int idx = rand() % N;
> >         return S[idx];
> > }
> 
> so modify the next_char function:
> 
> char next_character()
> {
> 	static unsigned int rand = 0; //(sizeof(int) = 32)
> 	static char bit_avail = 0;
> 	char result = 0;
> 	char move_bits = 0;
> 	char bits_moved = 0;
> 
> 	do {
> 		if (!bits_avail) {
>         		// Use the correct call for ISAAC instead of rand()
> 			rand = rand();
> 			
> 			bit_avail = 32;
> 		}
> 
> 		move_bits = bits_avail >= 6 ? 6 : bits_avail;
> 		result <<= move_bits;
> 		result = (result | rand & (0xFF >> (8 - move_bits))) & 0x3F;
> 		bits_avail -= move_bits;
> 		bits_moved += move_bits;
> 		rand >>= move_bits;
> 
> 	} while (bits_moved != 6 && result > 61);
> 
> 	return result;
> }

Well actually it looks simpler if you break this like this:

unsigned char get_6bits () 
{
	static unsigned int rand = 0; //(sizeof(int) = 32)
	static char bits_avail = 0;
	unsigned char result = 0;

	//get 2 bits 3 times: 32 is devidable by 2
	for (int i = 0; i < 3; i++) { // --std=c99
		//Fill buffer if it is empty!
		if (!bits_avail || bits_avail < 0 ) { //if bits_avail < 0 it is an error!
        		// Use the correct call for ISAAC instead of rand()
			rand = rand();
			
			bits_avail = 32;
		}

		result <<= 2; //move two bits to left.
		result = result | (rand & 0x3); //add two least signifigant bits to the result.
		rand >>= 2; //move two bits to right.
		bits_avail -= 2;
	}

	return result; //result has 6 bits of random data...
}

char next_character()
{
	unsigned char idx = 0;
	do {
		idx = get_6bits();
	} while (idx > 61);

	return S[idx];
}

Very simple :)

> This function will give perfect distribution of 1/62 probability for 
> every index. It will waste 6 bits with the probability of 1/32 (2/64).
> 
> > int
> > main(int argc, char* argv[])
> > {
> >         // Use the correct call for initializing the ISAAC seed
> >         srand((unsigned int)time(NULL));
> >         for (int i = 0; i < 20; i++) // --std=c99
> >                 printf("%c\n", next_character());
> >         return 0;
> > }
> > -------------------------------------------------------------------------------
> > 
> > If the ISAAC RNG has a good distribution, then the next_character()
> > function will give a good distribution among the set [A-Za-z0-9].
> > 
> > Unless I missunderstood what you meant with "create random numbers in
> > the range of [a-Za-z0-9]".

-- 
-Matti


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] OT: Mapping random numbers (PRNG)
  2014-06-07  9:19     ` Matti Nykyri
@ 2014-06-07 12:13       ` Marc Joliet
  0 siblings, 0 replies; 26+ messages in thread
From: Marc Joliet @ 2014-06-07 12:13 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1404 bytes --]

Am Sat, 7 Jun 2014 12:19:11 +0300
schrieb Matti Nykyri <Matti.Nykyri@iki.fi>:

> On Sat, Jun 07, 2014 at 12:03:29AM +0300, Matti Nykyri wrote:
[...]
> unsigned char get_6bits () 
> {
> 	static unsigned int rand = 0; //(sizeof(int) = 32)

Just an itty bitty nitpick: since you already require C99 as per the for loop
below, you might as well use uint32_t (from stdint.h) instead of assuming that
sizeof(int) == 32 :) .

> 	static char bits_avail = 0;
> 	unsigned char result = 0;
> 
> 	//get 2 bits 3 times: 32 is devidable by 2
> 	for (int i = 0; i < 3; i++) { // --std=c99
> 		//Fill buffer if it is empty!
> 		if (!bits_avail || bits_avail < 0 ) { //if bits_avail < 0 it is an error!
>         		// Use the correct call for ISAAC instead of rand()
> 			rand = rand();
> 			
> 			bits_avail = 32;
> 		}
> 
> 		result <<= 2; //move two bits to left.
> 		result = result | (rand & 0x3); //add two least signifigant bits to the result.
> 		rand >>= 2; //move two bits to right.
> 		bits_avail -= 2;
> 	}
> 
> 	return result; //result has 6 bits of random data...
> }
> 
> char next_character()
> {
> 	unsigned char idx = 0;
> 	do {
> 		idx = get_6bits();
> 	} while (idx > 61);
> 
> 	return S[idx];
> }
> 
> Very simple :)
[...]

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [gentoo-user] Re: OT: Mapping random numbers (PRNG)
  2014-06-06 21:03   ` Matti Nykyri
  2014-06-07  9:19     ` Matti Nykyri
@ 2014-06-26 21:00     ` Kai Krakow
  2014-06-26 22:07       ` Kai Krakow
  2014-06-27 21:13       ` [gentoo-user] " Matti Nykyri
  1 sibling, 2 replies; 26+ messages in thread
From: Kai Krakow @ 2014-06-26 21:00 UTC (permalink / raw
  To: gentoo-user

Matti Nykyri <Matti.Nykyri@iki.fi> schrieb:

> If you are looking a mathematically perfect solution there is a simple
> one even if your list is not in the power of 2! Take 6 bits at a time of
> the random data. If the result is 62 or 63 you will discard the data and
> get the next 6 bits. This selectively modifies the random data but keeps
> the probabilities in correct balance. Now the probability for index of
> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.

Why not do just something like this?

index = 0;
while (true) {
  index = (index + get_6bit_random()) % 62;
  output << char_array[index];
}

Done, no bits wasted. Should have perfect distribution also. We also don't 
have to throw away random data just to stay within unaligned boundaries. The 
unalignment is being taken over into the next loop so the "error" corrects 
itself over time (it becomes distributed over the whole set).

-- 
Replies to list only preferred.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [gentoo-user] Re: OT: Mapping random numbers (PRNG)
  2014-06-26 21:00     ` [gentoo-user] " Kai Krakow
@ 2014-06-26 22:07       ` Kai Krakow
  2014-06-27  8:55         ` thegeezer
  2014-06-27 21:13       ` [gentoo-user] " Matti Nykyri
  1 sibling, 1 reply; 26+ messages in thread
From: Kai Krakow @ 2014-06-26 22:07 UTC (permalink / raw
  To: gentoo-user

Kai Krakow <hurikhan77@gmail.com> schrieb:

> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb:
> 
>> If you are looking a mathematically perfect solution there is a simple
>> one even if your list is not in the power of 2! Take 6 bits at a time of
>> the random data. If the result is 62 or 63 you will discard the data and
>> get the next 6 bits. This selectively modifies the random data but keeps
>> the probabilities in correct balance. Now the probability for index of
>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
> 
> Why not do just something like this?
> 
> index = 0;
> while (true) {
>   index = (index + get_6bit_random()) % 62;
>   output << char_array[index];
> }
> 
> Done, no bits wasted. Should have perfect distribution also. We also don't
> have to throw away random data just to stay within unaligned boundaries.
> The unalignment is being taken over into the next loop so the "error"
> corrects itself over time (it becomes distributed over the whole set).

It is worth noting that my approach has the tendency of generating random 
characters in sequence. I think there are several possibilities to fix this, 
one being swapping two elements in the char array on each access. That fixed 
this particular problem in my tests. Here's are quick ruby snippet I played 
with:

$ cat random.rb
# danger: ugly code ahead 
dist = [0] * 62
seq = [0, 0]
chars = [*('A'..'Z'), *('a'..'z'), *('0'..'9')]
orig_chars = chars.dup

index = 0
old = nil
(62 * 10).times do
  # generate random wrapped index
  index = (index + rand(64)) % 62

  # swap two indexed positions and output the char
  chars[0], chars[index] = chars[index], chars[0]
  $stdout.write chars[0]

  # count the generated indexes
  dist[index] = dist[index] + 1

  # count if indexes were generated in sequence by looking
  # them up in the original sorted chars array
  unless old.nil?
    seq[0] = seq[0] + 1 if old < orig_chars.index(chars[0])
    seq[1] = seq[1] + 1 if old > orig_chars.index(chars[0])
  end
  old = orig_chars.index(chars[0])
end
puts

# check the average, it should always equal the loop count
# multiplicator
avg = dist.inject(:+) / 62.0

# calculate if the algorithm preferred sequences
seq = [*seq, seq[0] - seq[1]]

# output some stats
puts avg
puts dist.inspect
puts dist.map {|v| avg - v }.inspect
puts seq.inspect


-- 
Replies to list only preferred.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)
  2014-06-26 22:07       ` Kai Krakow
@ 2014-06-27  8:55         ` thegeezer
  2014-06-27 17:49           ` Matti Nykyri
  2014-06-27 17:50           ` [gentoo-user] " Kai Krakow
  0 siblings, 2 replies; 26+ messages in thread
From: thegeezer @ 2014-06-27  8:55 UTC (permalink / raw
  To: gentoo-user

On 06/26/2014 11:07 PM, Kai Krakow wrote:
>
> It is worth noting that my approach has the tendency of generating random 
> characters in sequence.

sorry but had to share this http://dilbert.com/strips/comic/2001-10-25/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)
  2014-06-27  8:55         ` thegeezer
@ 2014-06-27 17:49           ` Matti Nykyri
  2014-06-27 17:50           ` [gentoo-user] " Kai Krakow
  1 sibling, 0 replies; 26+ messages in thread
From: Matti Nykyri @ 2014-06-27 17:49 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

> On Jun 27, 2014, at 11:55, thegeezer <thegeezer@thegeezer.net> wrote:
> 
>> On 06/26/2014 11:07 PM, Kai Krakow wrote:
>> 
>> It is worth noting that my approach has the tendency of generating random 
>> characters in sequence.
> 
> sorry but had to share this http://dilbert.com/strips/comic/2001-10-25/
> 

This is a good one :) have really been thinking this same comic previosly when writing to this thread...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-27  8:55         ` thegeezer
  2014-06-27 17:49           ` Matti Nykyri
@ 2014-06-27 17:50           ` Kai Krakow
  2014-06-27 20:59             ` Neil Bothwick
  1 sibling, 1 reply; 26+ messages in thread
From: Kai Krakow @ 2014-06-27 17:50 UTC (permalink / raw
  To: gentoo-user

thegeezer <thegeezer@thegeezer.net> schrieb:

> On 06/26/2014 11:07 PM, Kai Krakow wrote:
>>
>> It is worth noting that my approach has the tendency of generating random
>> characters in sequence.
> 
> sorry but had to share this http://dilbert.com/strips/comic/2001-10-25/

:-)

I'm no mathematician, but well, I think the swapping approach fixes it. What 
this makes clear, however, is that randomness on its own does not completely 
ensure unpredictable items if combined with other functions. One has to 
carefully think about it. I, for myself, would always stay away from using 
modulo to clip the random numbers. It will always create bias. My first idea 
introduced predictable followers (you always knew that the next char had a 
specific probability related to the tail length of the list).

You can actually learn from Dilbert comics. ;-)

-- 
Replies to list only preferred.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-27 17:50           ` [gentoo-user] " Kai Krakow
@ 2014-06-27 20:59             ` Neil Bothwick
  0 siblings, 0 replies; 26+ messages in thread
From: Neil Bothwick @ 2014-06-27 20:59 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 296 bytes --]

On Fri, 27 Jun 2014 19:50:15 +0200, Kai Krakow wrote:

> You can actually learn from Dilbert comics. ;-)

Unless you're a PHB, they never learn.


-- 
Neil Bothwick

"You know how dumb the average person is? Well, statistically, half of
them are even dumber than that" - Lewton, P.I.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)
  2014-06-26 21:00     ` [gentoo-user] " Kai Krakow
  2014-06-26 22:07       ` Kai Krakow
@ 2014-06-27 21:13       ` Matti Nykyri
  2014-06-28 13:08         ` Matti Nykyri
  2014-06-28 21:28         ` [gentoo-user] " Kai Krakow
  1 sibling, 2 replies; 26+ messages in thread
From: Matti Nykyri @ 2014-06-27 21:13 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote:
> 
> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb:
> 
>> If you are looking a mathematically perfect solution there is a simple
>> one even if your list is not in the power of 2! Take 6 bits at a time of
>> the random data. If the result is 62 or 63 you will discard the data and
>> get the next 6 bits. This selectively modifies the random data but keeps
>> the probabilities in correct balance. Now the probability for index of
>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
> 
> Why not do just something like this?
> 
> index = 0;
> while (true) {
>  index = (index + get_6bit_random()) % 62;
>  output << char_array[index];
> }
> 
> Done, no bits wasted. Should have perfect distribution also. We also don't 
> have to throw away random data just to stay within unaligned boundaries. The 
> unalignment is being taken over into the next loop so the "error" corrects 
> itself over time (it becomes distributed over the whole set).

Distribution will not be perfect. The same original problem persists. Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. Now the addition changes this so that index 0 to 1 reflects to previous character and not the original index.

The distribution of like 10GB of data should be quite even but not on a small scale. The next char will depend on previous char. It is 100% more likely that the next char is the same or one index above the previous char then any of the other ones in the series. So it is likely that you will have long sets of same character.

Random means that for next char the probability is always even, 1/62. And like mentioned in Dilbert it is impossible to say that something is random but possible to say that it isn't.

If wasting 6bit of data seems large, do this:

index = get_6bit_random();
while (index > 61) {
 index <<= 1;
 index |= get_1bit_random();
 index &= 0x3F;
}
return index;

It will waste 1 bit at a time until result is less than 62. This will slightly change probabilities though :/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)
  2014-06-27 21:13       ` [gentoo-user] " Matti Nykyri
@ 2014-06-28 13:08         ` Matti Nykyri
  2014-06-28 21:28         ` [gentoo-user] " Kai Krakow
  1 sibling, 0 replies; 26+ messages in thread
From: Matti Nykyri @ 2014-06-28 13:08 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

On Jun 28, 2014, at 0:13, Matti Nykyri <matti.nykyri@iki.fi> wrote:

>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote:
>> 
>> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb:
>> 
>>> If you are looking a mathematically perfect solution there is a simple
>>> one even if your list is not in the power of 2! Take 6 bits at a time of
>>> the random data. If the result is 62 or 63 you will discard the data and
>>> get the next 6 bits. This selectively modifies the random data but keeps
>>> the probabilities in correct balance. Now the probability for index of
>>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
>> 
>> Why not do just something like this?
>> 
>> index = 0;
>> while (true) {
>> index = (index + get_6bit_random()) % 62;
>> output << char_array[index];
>> }
>> 
>> Done, no bits wasted. Should have perfect distribution also. We also don't 
>> have to throw away random data just to stay within unaligned boundaries. The 
>> unalignment is being taken over into the next loop so the "error" corrects 
>> itself over time (it becomes distributed over the whole set).
> 
> Distribution will not be perfect. The same original problem persists. Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. Now the addition changes this so that index 0 to 1 reflects to previous character and not the original index.
> 
> The distribution of like 10GB of data should be quite even but not on a small scale. The next char will depend on previous char. It is 100% more likely that the next char is the same or one index above the previous char then any of the other ones in the series. So it is likely that you will have long sets of same character.
> 
> Random means that for next char the probability is always even, 1/62. And like mentioned in Dilbert it is impossible to say that something is random but possible to say that it isn't.
> 
> If wasting 6bit of data seems large, do this:
> 
> index = get_6bit_random();
> while (index > 61) {
> index <<= 1;
> index |= get_1bit_random();
> index &= 0x3F;
> }
> return index;
> 
> It will waste 1 bit at a time until result is less than 62. This will slightly change probabilities though :/

Sorry this example is really flawed :( If next6bit is over 61 there are only two possible values for it: 62 or 63 -> that is 0x3E and 0x3F. So you see that only one bit changes. But that bit is random! So least significant bit is random and does not need to be discarded :)

index = get_6bit_random();
while (index > 61) {
    index <<= 5;
    index |= get_5bit_random();
    index &= 0x3F;
}
return index;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-27 21:13       ` [gentoo-user] " Matti Nykyri
  2014-06-28 13:08         ` Matti Nykyri
@ 2014-06-28 21:28         ` Kai Krakow
  2014-06-28 22:06           ` Canek Peláez Valdés
  2014-06-29  6:24           ` [gentoo-user] " Matti Nykyri
  1 sibling, 2 replies; 26+ messages in thread
From: Kai Krakow @ 2014-06-28 21:28 UTC (permalink / raw
  To: gentoo-user

Matti Nykyri <matti.nykyri@iki.fi> schrieb:

>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote:
>> 
>> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb:
>> 
>>> If you are looking a mathematically perfect solution there is a simple
>>> one even if your list is not in the power of 2! Take 6 bits at a time of
>>> the random data. If the result is 62 or 63 you will discard the data and
>>> get the next 6 bits. This selectively modifies the random data but keeps
>>> the probabilities in correct balance. Now the probability for index of
>>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
>> 
>> Why not do just something like this?
>> 
>> index = 0;
>> while (true) {
>>  index = (index + get_6bit_random()) % 62;
>>  output << char_array[index];
>> }
>> 
>> Done, no bits wasted. Should have perfect distribution also. We also
>> don't have to throw away random data just to stay within unaligned
>> boundaries. The unalignment is being taken over into the next loop so the
>> "error" corrects itself over time (it becomes distributed over the whole
>> set).
> 
> Distribution will not be perfect. The same original problem persists.
> Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64.
> Now the addition changes this so that index 0 to 1 reflects to previous
> character and not the original index.
> 
> The distribution of like 10GB of data should be quite even but not on a
> small scale. The next char will depend on previous char. It is 100% more
> likely that the next char is the same or one index above the previous char
> then any of the other ones in the series. So it is likely that you will
> have long sets of same character.

I cannot follow your reasoning here - but I'd like to learn. Actually, I ran 
this multiple times and never saw long sets of the same character, even no 
short sets of the same character. The 0 or 1 is always rolled over into the 
next random addition. I would only get sets of the same character if rand() 
returned zero multiple times after each other - which wouldn't be really 
random. ;-)

Keep in mind: The last index will be reused whenever you'd enter the 
function - it won't reset to zero. But still that primitive implementation 
had a flaw: It will tend to select characters beyond the current offset, if 
it is >= 1/2 into the complete set, otherwise it will prefer selecting 
characters before the offset.

In my tests I counted how ofter new_index > index and new_index < index, and 
it had a clear bias for the first. So I added swapping of the selected index 
with offset=0 in the set. Now the characters will be swapped and start to 
distribute that flaw. The distribution, however, didn't change.

Of course I'm no mathematician, I don't know how I'd calculate the 
probabilities for my implementation because it is sort of a recursive 
function (for get_rand()) when looking at it over time:

int get_rand() {
  static int index = 0;
  return (index = (index + get_6bit_rand()) % 62);
}

char get_char() {
  int index = get_rand();
  char tmp = chars[index];
  chars[index] = chars[0];
  return (chars[0] = tmp);
}

However, get_char() should return evenly distributes results.

What this shows, is, that while distribution is even among the result set, 
the implementation may still be flawed because results could be predictable 
for a subset of results. Or in other words: Simply looking at the 
distribution of results is not an indicator for randomness. I could change 
get_rand() in the following way:

int get_rand() {
  static int index = 0;
  return (index = (index + 1) % 62);
}

Results would be distributed even, but clearly it is not random.

-- 
Replies to list only preferred.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-28 21:28         ` [gentoo-user] " Kai Krakow
@ 2014-06-28 22:06           ` Canek Peláez Valdés
  2014-06-29  0:37             ` gottlieb
  2014-06-29  6:24           ` [gentoo-user] " Matti Nykyri
  1 sibling, 1 reply; 26+ messages in thread
From: Canek Peláez Valdés @ 2014-06-28 22:06 UTC (permalink / raw
  To: gentoo-user

On Sat, Jun 28, 2014 at 4:28 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
[ ... ]
> I cannot follow your reasoning here - but I'd like to learn. Actually, I ran
> this multiple times and never saw long sets of the same character, even no
> short sets of the same character. The 0 or 1 is always rolled over into the
> next random addition.

That doesn't matter. Take a non-negative integer N; if you flip a coin
an infinite number of times, then the probability of the coin landing
on the same face N times in a row is 1. This means that it is
*guaranteed* to happen, and it *will* happen for any N you want:
1,000,000, a thousand billions, a gazillion. That is a mathematical
fact.

This of course is a consequence of "infinite" being really really
large, but it means that technically you cannot rule a RNG as broken
only because you saw that it produced the same result N times, which
is the crux of the Dilbert joke.

In practice, of course, it's a big sign that something is wrong. But
there is a non-zero probability that it's actually correct.

Because with randomness, you can never be sure.

Regards.
-- 
Canek Peláez Valdés
Profesor de asignatura, Facultad de Ciencias
Universidad Nacional Autónoma de México


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-28 22:06           ` Canek Peláez Valdés
@ 2014-06-29  0:37             ` gottlieb
  2014-06-29  0:53               ` Canek Peláez Valdés
  0 siblings, 1 reply; 26+ messages in thread
From: gottlieb @ 2014-06-29  0:37 UTC (permalink / raw
  To: gentoo-user

On Sat, Jun 28 2014, Canek Peláez Valdés wrote:

> That doesn't matter. Take a non-negative integer N; if you flip a coin
> an infinite number of times, then the probability of the coin landing
> on the same face N times in a row is 1.

This is certainly true.

> This means that it is *guaranteed* to happen

That is not as clear.  Prob = 1 does not always mean certain (when there
are infinite possibilities).  For example the probability is zero that a
random rational number chosen between 0 and 1 is exactly 1/2.  So the
probability is 1 that the number is not 1/2.  However it is not certain
that the random choice will not be 1/2.

allan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-29  0:37             ` gottlieb
@ 2014-06-29  0:53               ` Canek Peláez Valdés
  2014-06-29  1:46                 ` [gentoo-user] " »Q«
  0 siblings, 1 reply; 26+ messages in thread
From: Canek Peláez Valdés @ 2014-06-29  0:53 UTC (permalink / raw
  To: gentoo-user

On Sat, Jun 28, 2014 at 7:37 PM,  <gottlieb@nyu.edu> wrote:
> On Sat, Jun 28 2014, Canek Peláez Valdés wrote:
>
>> That doesn't matter. Take a non-negative integer N; if you flip a coin
>> an infinite number of times, then the probability of the coin landing
>> on the same face N times in a row is 1.
>
> This is certainly true.
>
>> This means that it is *guaranteed* to happen
>
> That is not as clear.

Let me be more precise (and please correct me if I'm wrong): It is
guaranteed to happen at some point in the infinite sequence of random
flip coins, but we cannot know when it will happen, only that it will
happen.

That's the way I got it when I took my probability courses, admittedly
many years ago.

In any way, even if I'm wrong and it is not guaranteed, the main point
remains true: the probability of getting a large sequence of the same
number from a RNG is 1 for every true random RNG, and therefore seeing
a large sequence of the same number form a RNG doesn't (technically)
means that it is broken.

Regards.
-- 
Canek Peláez Valdés
Profesor de asignatura, Facultad de Ciencias
Universidad Nacional Autónoma de México


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [gentoo-user] Re: OT: Mapping random numbers (PRNG)
  2014-06-29  0:53               ` Canek Peláez Valdés
@ 2014-06-29  1:46                 ` »Q«
  2014-06-29  2:57                   ` Canek Peláez Valdés
  0 siblings, 1 reply; 26+ messages in thread
From: »Q« @ 2014-06-29  1:46 UTC (permalink / raw
  To: gentoo-user

On Sat, 28 Jun 2014 19:53:08 -0500
Canek Peláez Valdés <caneko@gmail.com> wrote:

> On Sat, Jun 28, 2014 at 7:37 PM,  <gottlieb@nyu.edu> wrote:
> > On Sat, Jun 28 2014, Canek Peláez Valdés wrote:
> >
> >> That doesn't matter. Take a non-negative integer N; if you flip a
> >> coin an infinite number of times, then the probability of the coin
> >> landing on the same face N times in a row is 1.
> >
> > This is certainly true.
> >
> >> This means that it is *guaranteed* to happen
> >
> > That is not as clear.
> 
> Let me be more precise (and please correct me if I'm wrong): It is
> guaranteed to happen at some point in the infinite sequence of random
> flip coins, but we cannot know when it will happen, only that it will
> happen.
> 
> That's the way I got it when I took my probability courses, admittedly
> many years ago.

The probability is 1 in the sense that the as the number of flips M
increases, so does the probability of getting N heads (or tails) in a
row also increases, and the upper bound for the sequence of
probabilities is 1.  It's not a probability about something which
actually happens;  no one so far has been able to flip a coin an
infinite number of times, not even a computer.

> In any way, even if I'm wrong and it is not guaranteed, the main point
> remains true: the probability of getting a large sequence of the same
> number from a RNG is 1 for every true random RNG, and therefore seeing
> a large sequence of the same number form a RNG doesn't (technically)
> means that it is broken.

It's true that that wouldn't *prove* the generator is broken.  But it
might be a good reason to take another look at the algorithm.
> 
> Regards.





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)
  2014-06-29  1:46                 ` [gentoo-user] " »Q«
@ 2014-06-29  2:57                   ` Canek Peláez Valdés
  0 siblings, 0 replies; 26+ messages in thread
From: Canek Peláez Valdés @ 2014-06-29  2:57 UTC (permalink / raw
  To: gentoo-user

On Sat, Jun 28, 2014 at 8:46 PM, »Q« <boxcars@gmx.net> wrote:
> On Sat, 28 Jun 2014 19:53:08 -0500
> Canek Peláez Valdés <caneko@gmail.com> wrote:
>
>> On Sat, Jun 28, 2014 at 7:37 PM,  <gottlieb@nyu.edu> wrote:
>> > On Sat, Jun 28 2014, Canek Peláez Valdés wrote:
>> >
>> >> That doesn't matter. Take a non-negative integer N; if you flip a
>> >> coin an infinite number of times, then the probability of the coin
>> >> landing on the same face N times in a row is 1.
>> >
>> > This is certainly true.
>> >
>> >> This means that it is *guaranteed* to happen
>> >
>> > That is not as clear.
>>
>> Let me be more precise (and please correct me if I'm wrong): It is
>> guaranteed to happen at some point in the infinite sequence of random
>> flip coins, but we cannot know when it will happen, only that it will
>> happen.
>>
>> That's the way I got it when I took my probability courses, admittedly
>> many years ago.
>
> The probability is 1 in the sense that the as the number of flips M
> increases, so does the probability of getting N heads (or tails) in a
> row also increases, and the upper bound for the sequence of
> probabilities is 1.  It's not a probability about something which
> actually happens;  no one so far has been able to flip a coin an
> infinite number of times, not even a computer.

And no one will. Ever.

>> In any way, even if I'm wrong and it is not guaranteed, the main point
>> remains true: the probability of getting a large sequence of the same
>> number from a RNG is 1 for every true random RNG, and therefore seeing
>> a large sequence of the same number form a RNG doesn't (technically)
>> means that it is broken.
>
> It's true that that wouldn't *prove* the generator is broken.  But it
> might be a good reason to take another look at the algorithm.

Agreed.

Regards.
-- 
Canek Peláez Valdés
Profesor de asignatura, Facultad de Ciencias
Universidad Nacional Autónoma de México


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-28 21:28         ` [gentoo-user] " Kai Krakow
  2014-06-28 22:06           ` Canek Peláez Valdés
@ 2014-06-29  6:24           ` Matti Nykyri
  2014-06-29 12:38             ` [gentoo-user] " Kai Krakow
  1 sibling, 1 reply; 26+ messages in thread
From: Matti Nykyri @ 2014-06-29  6:24 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

On Jun 29, 2014, at 0:28, Kai Krakow <hurikhan77@gmail.com> wrote:
> 
> Matti Nykyri <matti.nykyri@iki.fi> schrieb:
> 
>>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote:
>>> 
>>> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb:
>>> 
>>>> If you are looking a mathematically perfect solution there is a simple
>>>> one even if your list is not in the power of 2! Take 6 bits at a time of
>>>> the random data. If the result is 62 or 63 you will discard the data and
>>>> get the next 6 bits. This selectively modifies the random data but keeps
>>>> the probabilities in correct balance. Now the probability for index of
>>>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
>>> 
>>> Why not do just something like this?
>>> 
>>> index = 0;
>>> while (true) {
>>> index = (index + get_6bit_random()) % 62;
>>> output << char_array[index];
>>> }
>>> 
>>> Done, no bits wasted. Should have perfect distribution also. We also
>>> don't have to throw away random data just to stay within unaligned
>>> boundaries. The unalignment is being taken over into the next loop so the
>>> "error" corrects itself over time (it becomes distributed over the whole
>>> set).
>> 
>> Distribution will not be perfect. The same original problem persists.
>> Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64.
>> Now the addition changes this so that index 0 to 1 reflects to previous
>> character and not the original index.
>> 
>> The distribution of like 10GB of data should be quite even but not on a
>> small scale. The next char will depend on previous char. It is 100% more
>> likely that the next char is the same or one index above the previous char
>> then any of the other ones in the series. So it is likely that you will
>> have long sets of same character.
> 
> I cannot follow your reasoning here - but I'd like to learn. Actually, I ran 
> this multiple times and never saw long sets of the same character, even no 
> short sets of the same character. The 0 or 1 is always rolled over into the 
> next random addition. I would only get sets of the same character if rand() 
> returned zero multiple times after each other - which wouldn't be really 
> random. ;-)

In your example that isn't true. You will get the same character if 6bit random number is 0 or if it is 62! This is what makes the flaw!

You will also get the next character if random number is 1 or 63.

That is why the possibility for 0 and 1 (after modulo 62) is twice as large compared to all other values (2-61).

By definition random means that the probability for every value should be the same. So if you have 62 options and even distribution of probability the probability for each of them is 1/62. 

> Keep in mind: The last index will be reused whenever you'd enter the 
> function - it won't reset to zero. But still that primitive implementation 
> had a flaw: It will tend to select characters beyond the current offset, if 
> it is >= 1/2 into the complete set, otherwise it will prefer selecting 
> characters before the offset.

If you modify the sequence so that if looks random it is pseudo random. 

> In my tests I counted how ofter new_index > index and new_index < index, and 
> it had a clear bias for the first. So I added swapping of the selected index 
> with offset=0 in the set. Now the characters will be swapped and start to 
> distribute that flaw. The distribution, however, didn't change.

Try counting how of often new_index = index and new_index = (index + 1) % 62 and new_index = (index + 2) % 62. With your algorithm the last one should be significantly less then the first two in large sample.

> Of course I'm no mathematician, I don't know how I'd calculate the 
> probabilities for my implementation because it is sort of a recursive 
> function (for get_rand()) when looking at it over time:
> 
> int get_rand() {
>  static int index = 0;
>  return (index = (index + get_6bit_rand()) % 62);
> }
> 
> char get_char() {
>  int index = get_rand();
>  char tmp = chars[index];
>  chars[index] = chars[0];
>  return (chars[0] = tmp);
> }
> 
> However, get_char() should return evenly distributes results.
> 
> What this shows, is, that while distribution is even among the result set, 
> the implementation may still be flawed because results could be predictable 
> for a subset of results. Or in other words: Simply looking at the 
> distribution of results is not an indicator for randomness. I could change 
> get_rand() in the following way:
> 
> int get_rand() {
>  static int index = 0;
>  return (index = (index + 1) % 62);
> }
> 
> Results would be distributed even, but clearly it is not random.
> 
> -- 
> Replies to list only preferred.
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-29  6:24           ` [gentoo-user] " Matti Nykyri
@ 2014-06-29 12:38             ` Kai Krakow
  2014-06-30 23:47               ` Matti Nykyri
  0 siblings, 1 reply; 26+ messages in thread
From: Kai Krakow @ 2014-06-29 12:38 UTC (permalink / raw
  To: gentoo-user

Matti Nykyri <matti.nykyri@iki.fi> schrieb:

> On Jun 29, 2014, at 0:28, Kai Krakow <hurikhan77@gmail.com> wrote:
>> 
>> Matti Nykyri <matti.nykyri@iki.fi> schrieb:
>> 
>>>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@gmail.com> wrote:
>>>> 
>>>> Matti Nykyri <Matti.Nykyri@iki.fi> schrieb:
>>>> 
>>>>> If you are looking a mathematically perfect solution there is a simple
>>>>> one even if your list is not in the power of 2! Take 6 bits at a time
>>>>> of the random data. If the result is 62 or 63 you will discard the
>>>>> data and get the next 6 bits. This selectively modifies the random
>>>>> data but keeps the probabilities in correct balance. Now the
>>>>> probability for index of 0-61 is 1/62 because the probability to get
>>>>> 62-63 out of 64 if 0.
>>>> 
>>>> Why not do just something like this?
>>>> 
>>>> index = 0;
>>>> while (true) {
>>>> index = (index + get_6bit_random()) % 62;
>>>> output << char_array[index];
>>>> }
>>>> 
>>>> Done, no bits wasted. Should have perfect distribution also. We also
>>>> don't have to throw away random data just to stay within unaligned
>>>> boundaries. The unalignment is being taken over into the next loop so
>>>> the "error" corrects itself over time (it becomes distributed over the
>>>> whole set).
>>> 
>>> Distribution will not be perfect. The same original problem persists.
>>> Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be
>>> 1/64. Now the addition changes this so that index 0 to 1 reflects to
>>> previous character and not the original index.
>>> 
>>> The distribution of like 10GB of data should be quite even but not on a
>>> small scale. The next char will depend on previous char. It is 100% more
>>> likely that the next char is the same or one index above the previous
>>> char then any of the other ones in the series. So it is likely that you
>>> will have long sets of same character.
>> 
>> I cannot follow your reasoning here - but I'd like to learn. Actually, I
>> ran this multiple times and never saw long sets of the same character,
>> even no short sets of the same character. The 0 or 1 is always rolled
>> over into the next random addition. I would only get sets of the same
>> character if rand() returned zero multiple times after each other - which
>> wouldn't be really random. ;-)
> 
> In your example that isn't true. You will get the same character if 6bit
> random number is 0 or if it is 62! This is what makes the flaw!
> 
> You will also get the next character if random number is 1 or 63.
> 
> That is why the possibility for 0 and 1 (after modulo 62) is twice as
> large compared to all other values (2-61).

Ah, now I get it.

> By definition random means that the probability for every value should be
> the same. So if you have 62 options and even distribution of probability
> the probability for each of them is 1/62.

Still, the increased probability for single elements should hit different 
elements each time. So for large sets it will distribute - however, I now 
get why it's not completely random by definition.

>> In my tests I counted how ofter new_index > index and new_index < index,
>> and it had a clear bias for the first. So I added swapping of the
>> selected index with offset=0 in the set. Now the characters will be
>> swapped and start to distribute that flaw. The distribution, however,
>> didn't change.
> 
> Try counting how of often new_index = index and new_index = (index + 1) %
> 62 and new_index = (index + 2) % 62. With your algorithm the last one
> should be significantly less then the first two in large sample.

I will try that. It looks like a good approach.

-- 
Replies to list only preferred.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG)
  2014-06-29 12:38             ` [gentoo-user] " Kai Krakow
@ 2014-06-30 23:47               ` Matti Nykyri
  0 siblings, 0 replies; 26+ messages in thread
From: Matti Nykyri @ 2014-06-30 23:47 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

On Sun, Jun 29, 2014 at 02:38:51PM +0200, Kai Krakow wrote:
> Matti Nykyri <matti.nykyri@iki.fi> schrieb:
> 
> > That is why the possibility for 0 and 1 (after modulo 62) is twice as
> > large compared to all other values (2-61).
> 
> Ah, now I get it.
> 
> > By definition random means that the probability for every value should be
> > the same. So if you have 62 options and even distribution of probability
> > the probability for each of them is 1/62.
> 
> Still, the increased probability for single elements should hit different 
> elements each time. So for large sets it will distribute - however, I now 
> get why it's not completely random by definition.

Usually when you need random data the quality needs to be good! Key, 
passwords etc. For example if an attacker knows that your random number 
generator same or the next index with double probability, he will most 
likely crack each character with half the tries. So for each character 
in your password the time is split in half. Again 8 character password 
becomes 2^8 times easier to break compared to truely random data. This 
is just an example though.

> > Try counting how of often new_index = index and new_index = (index + 1) %
> > 62 and new_index = (index + 2) % 62. With your algorithm the last one
> > should be significantly less then the first two in large sample.
> 
> I will try that. It looks like a good approach.

Ok. I wrote a little library that takes random data and mathematically 
accurately splits it into wanted data. It is attached to the mail. You 
only need to specify the random source and the maximum number you wish 
to see in your set. So with 5 you get everything from 0 to 5 (in total 
of 6 elements). The library takes care of buffering. And most 
importantly keeps probabilities equal :)

-- 
-Matti

[-- Attachment #2: Makefile --]
[-- Type: text/plain, Size: 840 bytes --]

VERSION=v0.1

prefix=/usr/local

CC=$(CROSS_COMPILE)g++
LD=$(CROSS_COMPILE)ld

SYS=posix

DEF=-DRNG_VERSION=\"$(VERSION)\"
OPT=-O2
XCFLAGS=-fPIC -DPIC -march=nocona
#XCFLAGS=-fPIC -DPIC -DDEBUG -march=nocona
XLDFLAGS=$(XCFLAGS) -Wl,--as-needed -Wl,-O1 -Wl,-soname=librng.so
CPPFLAGS=-Wall -std=gnu++98 $(XCFLAGS) $(INC) $(DEF) $(OPT)
LDFLAGS=-Wall -shared $(XLDFLAGS)
TESTLDFLAGS=-Wall
#TESTLDFLAGS=-Wall -lrng

bindir=$(prefix)/bin
libdir=$(prefix)/lib

BINDIR=$(DESTDIR)$(bindir)
LIBDIR=$(DESTDIR)$(libdir)

SLIBS=$(LIBS)

EXT=$(EXT_$(SYS))

LIBS=librng.so

all: $(LIBS) rng

install:	$(LIBS)
	-mkdir -p $(BINDIR) $(LIBDIR)
	cp rng$(EXT) $(BINDIR)

clean:
	rm -f *.o *.so rng$(EXT)

rng: rng.o
	$(CC) $(TESTLDFLAGS) -o $@$(EXT) $@.o librng.o
rng.o: rng.cpp

librng.so: librng.o
	$(CC) $(LDFLAGS) -o $@$(EXT) librng.o
librng.o: librng.cpp

[-- Attachment #3: librng.h --]
[-- Type: text/x-chdr, Size: 1344 bytes --]

//#define BUFFER_SIZE 4096
//64 bits is 8 bytes: number of uint64_t in buffer
//#define NUM_SETS (4096 / 8)
//#define NUM_BITS 64
#include <inttypes.h>

struct BinaryData {
  uint64_t data;
  int8_t bits;
};

class BitContainer {
public:
  BitContainer();
  ~BitContainer();
  
  bool has(int8_t bits);
  uint64_t get(int8_t bits);
  int8_t set(uint64_t data, int8_t bits);
  void fill(uint64_t *data);
  
  static void cpy(struct BinaryData *dest, struct BinaryData *src, int8_t bits);

private:
  void xfer();
  static void added(int8_t &stored, int8_t bits);

  struct BinaryData pri;
  struct BinaryData sec;
};

class Rng {
public:
  Rng(char* device, uint64_t max);
  ~Rng();
  
  const uint64_t setMax(const uint64_t max);
  uint64_t getMax();
  int setDevice(const char* device);
  
  uint64_t getRnd();  

  static uint64_t getMask(int8_t bits);
  static int8_t calculateBits(uint64_t level);
  
private:
  void fillBuffer();
  void readBuffer();
  
  void getBits(uint64_t *data, int8_t *avail, uint64_t *out);
  void saveBits(uint64_t save);
  void processBits(uint64_t max, uint64_t level, uint64_t data);
  
  void error(const char* str);

  int iRndFD;
  size_t lCursor;
  size_t lBuffer;
  uint64_t* pStart;
  uint64_t* pNext;
  uint64_t* pEnd;
  
  BitContainer sRnd;

  uint64_t lMax;
  uint64_t lOutMask;
  int8_t cOutBits;
};

[-- Attachment #4: librng.cpp --]
[-- Type: text/x-c++src, Size: 7171 bytes --]

#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include "librng.h"

#ifdef DEBUG
 #include <stdio.h>
 #include <stdlib.h>
 long* results = 0;
 long* results2 = 0;
 unsigned long dMax = 0;
 int pushed[64];
 long readData = 0;
 long readBuff = 0;
 long readBits = 0;
 long validBits = 0;
 long bitsPushed = 0;
 long readExtra = 0;
 int bits = 0;
 
 unsigned long totalBits = 0;
 unsigned long used = 0;
 unsigned long wasted = 0;
 
 unsigned long power(int exp) {
   unsigned long x = 1;
   
   for (int i = 0; i < exp; i++)
     x *= 2;
   
   return x;
 }
 
 void dump_results() {
   fprintf(stderr, "Rounds for each number:\n");
   for (unsigned long i = 0; i < dMax; i++)
     fprintf(stderr, "%li = %li\t", i, results[i]);
   fprintf(stderr, "\n");
   
   fprintf(stderr, "Rounds for each initial number:\n");
   for (unsigned long i = 0; i < power(bits); i++)
     fprintf(stderr, "%li = %li\t", i, results2[i]);
   fprintf(stderr, "\n");
   
   fprintf(stderr, "Rounds for extra bits: total pushed: \t%li\n", bitsPushed);
   for (int i = 0; i < 64; i++)
     fprintf(stderr, "%i = %i\t", i, pushed[i]);
   fprintf(stderr, "\n");
   
   fprintf(stderr, "Device reads = %li\n", readData);
   fprintf(stderr, "Buffer reads = %li\n", readBuff);
   fprintf(stderr, "64bit reads = %li\n", readBits);
   fprintf(stderr, "Valid results = %li\n", validBits);
   fprintf(stderr, "Extra bit reads = %li\n", readExtra);
   fprintf(stderr, "Num bits per sample = %i\n", bits);   
   fprintf(stderr, "Pagesize = %li\n", sysconf(_SC_PAGE_SIZE));
   fprintf(stderr, "Maximum value = %lu\n", dMax);
   fprintf(stderr, "Total bits read = %lu\n", totalBits);
   fprintf(stderr, "Bits output = %li\n", used);
   fprintf(stderr, "Bits wasted = %li\n", wasted);
   //fprintf(stderr, " = %li\n");
 }
#endif

Rng::Rng(char* device, uint64_t max) {
  iRndFD = -1;
  lCursor = 0;
  lBuffer = 0;
  pStart = 0;
  pNext = 0;
  pEnd = 0;
    
  long lPagesize = sysconf(_SC_PAGE_SIZE);
  if (lPagesize % sizeof(uint64_t) != 0)
    error("Pagesize not multiple of uint64_t");
  
  pStart = (uint64_t*)mmap(NULL, lPagesize, PROT_READ | PROT_WRITE, 
			   MAP_SHARED | MAP_ANONYMOUS | MAP_NORESERVE, -1 ,0);
  if (pStart == MAP_FAILED)
    error("mmap");
  
  lBuffer = lPagesize;
  lCursor = 0;
  pNext = pEnd = pStart;
  
  setMax(max);
  setDevice(device);
}

Rng::~Rng() {
  if (iRndFD != -1)
    close(iRndFD);
  
  iRndFD = -1;
  
  munmap(pStart, pEnd - pStart);
  pStart = pEnd = pNext = 0;

#ifdef DEBUG
  dump_results();
  
  if (results)
    free(results);
  if (results2)
    free(results2);
#endif
}

uint64_t Rng::getMax() {
  return lMax;
}

const uint64_t Rng::setMax(const uint64_t max) {
  lMax = max;
  lOutMask = 0;
  cOutBits = 0;
  
  while (lMax > lOutMask) {
    lOutMask = (lOutMask << 1) + 1;
    cOutBits++;
  }

#ifdef DEBUG
  if (results)
    free(results);
  if (results2)
    free(results2);
  results = (long *)malloc((lMax + 1) * sizeof(long));
  results2 = (long *)malloc((power(cOutBits)) * sizeof(long));
  dMax = lMax + 1;
  bits = cOutBits;
#endif
  
  return lMax;
}

int Rng::setDevice(const char* device) {
  if (iRndFD != -1)
    close(iRndFD);
  
  iRndFD = open(device, O_RDONLY);
  if (iRndFD == -1)
    error("Open failed");
  
  return iRndFD;
}
  
uint64_t Rng::getRnd() {
  uint64_t lOut = 0;
  bool valid = 0;
  
  if (iRndFD == -1 || pStart == 0) {
    error("MMAP or Open failed");
    return lOut;
  }
  
  if (lMax == 0) {
    error("Maximun set to zero");
    return lOut;
  }
  
  do {
    if (sRnd.has(cOutBits)) {
      lOut = sRnd.get(cOutBits);
      
#ifdef DEBUG
      wasted += cOutBits;
      readBits++;
#endif
  
      if (lOut <= lMax)
	valid = 1;
      else
	saveBits(lOut);
    }
    else
      readBuffer();
    
  } while (!valid);
  
#ifdef DEBUG
  results[lOut]++;
  validBits++;
  used += cOutBits;
  wasted -= cOutBits;
#endif
  return lOut;
}

void Rng::saveBits(uint64_t save) {
  uint64_t level = 1;
  
  uint64_t max = lOutMask;
  max -= lMax;
  
  save -= lMax;
  save --;
  
  do {
    level <<= 1;
  } while (level <= max);
  
  level >>= 1;
  
  processBits(max, level, save);
}

inline void Rng::processBits(uint64_t max, uint64_t level, uint64_t data) {
  if (data < level) {
    sRnd.set(data, calculateBits(level));

#ifdef DEBUG
    pushed[calculateBits(level)]++;
    bitsPushed++;
    wasted -= calculateBits(level);
#endif
  }
  else {
    max -= level;
    data -= level;
    do {
      level >>= 1;
    } while (level > max);
    
    if (level > 1)
      processBits(max, level, data);
#ifdef DEBUG
    else {
      pushed[0]++;
    }
#endif
  }
}

void Rng::readBuffer() {
  if (pNext == pEnd)
    fillBuffer();
  
  sRnd.fill(pNext++);

#ifdef DEBUG
  readBuff++;
#endif
}

void Rng::fillBuffer() {
  size_t count = 0;
  ssize_t bytes = 0;
  uint8_t* start = (uint8_t *)pStart;

  if (lCursor == lBuffer) {
    pNext = pEnd = pStart;
    lCursor = 0;
  }
  
  do {
    bytes = read(iRndFD, start + lCursor, lBuffer - lCursor);
    if (bytes == -1)
      error("Read interrupted");
    else {
      lCursor += bytes;
      count += bytes;
    }
  } while (count < sizeof(uint64_t) && lBuffer != lCursor);
  
  pEnd = pStart + (lCursor / sizeof(uint64_t));
  
#ifdef DEBUG
  totalBits += count * 8;
  readData++;
#endif
}

inline void Rng::error (const char* str) {
#ifdef DEBUG
  fprintf(stderr, "RNG error: %s\n", str);
#endif
}

BitContainer::BitContainer() {
  pri = (struct BinaryData) {0, 0};
  sec = pri;
}

BitContainer::~BitContainer() {
}

bool BitContainer::has(int8_t bits) {
  if (pri.bits >= bits)
    return true;
  
  if (sec.bits)
    xfer();
  
  if (pri.bits >= bits)
    return true;

  return false;
}

uint64_t Rng::getMask(int8_t bits) {
  uint64_t mask = 0;

  mask = ~mask;

  //64bit shift at once is undefined:
  mask <<= (bits - 1);
  mask <<= 1;
  
  mask = ~mask;
  
  return mask;
}

int8_t Rng::calculateBits(uint64_t level) {
  int8_t bits = 0;
  
  while (level > 1) {
    level >>= 1;
    bits++;
  }
  
  return bits;
}

uint64_t BitContainer::get(int8_t bits) {
  uint64_t data = 0;
  uint64_t mask = Rng::getMask(bits);
  
  data = pri.data & mask;
  pri.data >>= bits;
  pri.bits -= bits;
  
  return data;
}

void BitContainer::xfer() {
  int8_t bits = 0;
  int8_t free = (sizeof(uint64_t) * 8) - pri.bits;
  
  bits = free < sec.bits ? free : sec.bits;
  
  cpy(&pri, &sec, bits);
}

void BitContainer::cpy(struct BinaryData *dest, struct BinaryData *src, int8_t bits) {
  dest->data <<= bits;
  dest->data |= (src->data & Rng::getMask(bits));
  src->data >>= bits;
  
  src->bits -= bits;
  added(dest->bits, bits);
}

void BitContainer::added(int8_t &stored, int8_t bits) {
  stored += bits;
  
  if (stored > ((int8_t)sizeof(uint64_t) * 8))
    stored = sizeof(uint64_t) * 8;
}

int8_t BitContainer::set(uint64_t data, int8_t bits) {
  struct BinaryData src = (struct BinaryData) {data, bits};
  
  int8_t free = (sizeof(uint64_t) * 8) - sec.bits;
  free = free < bits ? free : bits;
  
  cpy(&sec, &src, free);
  
  return free;
}

void BitContainer::fill(uint64_t *data) {
  sec = pri;
  
  pri.data = *data;
  pri.bits = sizeof(uint64_t) * 8;
}

[-- Attachment #5: rng.cpp --]
[-- Type: text/x-c++src, Size: 433 bytes --]

#include <stdlib.h>
#include <stdio.h>

#include "librng.h"

int main(const int ARGC, char* const ARGV[], char* const ENV[]) {
  int rounds = 0;
  unsigned long max = 0;
  unsigned long result = 0;
  
  rounds = atoi(ARGV[2]);
  max = strtoul(ARGV[3], NULL, 0);
  
  Rng rand(ARGV[1], max);
  
  for (int i = 0; i < rounds; i++) {
    result = rand.getRnd();
    (void) result;
    printf("rnd = %lu\n", result);
  }
  
  return 0;
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-06-30 23:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-06  2:56 [gentoo-user] OT: Mapping random numbers (PRNG) meino.cramer
2014-06-06  3:58 ` Canek Peláez Valdés
2014-06-06 18:39   ` meino.cramer
2014-06-06 19:37     ` null_ptr
2014-06-06 19:55     ` Canek Peláez Valdés
2014-06-06 21:05     ` Matti Nykyri
2014-06-06 21:03   ` Matti Nykyri
2014-06-07  9:19     ` Matti Nykyri
2014-06-07 12:13       ` Marc Joliet
2014-06-26 21:00     ` [gentoo-user] " Kai Krakow
2014-06-26 22:07       ` Kai Krakow
2014-06-27  8:55         ` thegeezer
2014-06-27 17:49           ` Matti Nykyri
2014-06-27 17:50           ` [gentoo-user] " Kai Krakow
2014-06-27 20:59             ` Neil Bothwick
2014-06-27 21:13       ` [gentoo-user] " Matti Nykyri
2014-06-28 13:08         ` Matti Nykyri
2014-06-28 21:28         ` [gentoo-user] " Kai Krakow
2014-06-28 22:06           ` Canek Peláez Valdés
2014-06-29  0:37             ` gottlieb
2014-06-29  0:53               ` Canek Peláez Valdés
2014-06-29  1:46                 ` [gentoo-user] " »Q«
2014-06-29  2:57                   ` Canek Peláez Valdés
2014-06-29  6:24           ` [gentoo-user] " Matti Nykyri
2014-06-29 12:38             ` [gentoo-user] " Kai Krakow
2014-06-30 23:47               ` Matti Nykyri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox