public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Canek Peláez Valdés" <caneko@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] OT: Mapping random numbers (PRNG)
Date: Fri, 6 Jun 2014 14:55:34 -0500	[thread overview]
Message-ID: <CADPrc82ro80wVTzPeSc5GyWXpO=TF5SNf1b3NUXrCENaKUkx6A@mail.gmail.com> (raw)
In-Reply-To: <20140606183928.GC4718@solfire>

On Fri, Jun 6, 2014 at 1:39 PM,  <meino.cramer@gmx.de> wrote:
> Canek Peláez Valdés <caneko@gmail.com> [14-06-06 17:36]:
>> On Thu, Jun 5, 2014 at 9:56 PM,  <meino.cramer@gmx.de> wrote:
>> > Hi,
>> >
>> > I am experimenting with the C code of the ISAAC pseudo random number generator
>> > (http://burtleburtle.net/bob/rand/isaacafa.html).
>> >
>> > Currently the implementation creates (on my embedded linux) 32 bit
>> > hexadecimal output.
>>
>> So it's a 32 bit integer.
>>
>> > From this I want to create random numbers in the range of [a-Za-z0-9]
>> > *without violating randomness* and (if possible) without throwing
>> > away bits of the output.
>>
>> You mean *characters* int the range [A-Za-z0-9]?
>>
>> > How can I do this mathemtically (in concern of the quality of output)
>> > correct?
>>
>> The easiest thing to do would be:
>>
>> -------------------------------------------------------------------------------
>> #include <time.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> #define N (26+26+10)
>>
>> static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
>> 'K', 'L', 'M',
>>                     'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
>> 'X', 'Y', 'Z',
>>                     'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
>> 'k', 'l', 'm',
>>                     'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
>> 'x', 'y', 'z',
>>                     '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
>>
>> int
>> next_character()
>> {
>>         // Use the correct call for ISAAC instead of rand()
>>         unsigned int idx = rand() % N;
>>         return S[idx];
>> }
>>
>> int
>> main(int argc, char* argv[])
>> {
>>         // Use the correct call for initializing the ISAAC seed
>>         srand((unsigned int)time(NULL));
>>         for (int i = 0; i < 20; i++) // --std=c99
>>                 printf("%c\n", next_character());
>>         return 0;
>> }
>> -------------------------------------------------------------------------------
>>
>> If the ISAAC RNG has a good distribution, then the next_character()
>> function will give a good distribution among the set [A-Za-z0-9].
>>
>> Unless I missunderstood what you meant with "create random numbers in
>> the range of [a-Za-z0-9]".
>>
>> Regards.
>> --
>> Canek Peláez Valdés
>> Profesor de asignatura, Facultad de Ciencias
>> Universidad Nacional Autónoma de México
>>
>
> Hi,
>
> Thank you very much for the input! :)
>
> I have a question about the algorithm:
> Suppose rand() has an equal distribution of numbers and furthermore
> one has a count of 2^32 random numbers listed in numerical sort
> order.
> In this list each number would appear (nearly) with the same count: 1
>
> To get an better imagination of that...suppose the rand() would only
> return numbers in the range of 1...12 and the alphabet has only 8
> characters (as 2^32 is not devideable by 62)
>
> rand():
> 1 2 3 4 5 6 7 8 9 10 11 12
>
> rand()%N : rand()%7
> 1 2 3 4 5 6 7 0 1  2  3  4
>
> or in other words: An even distribution of numbers of rand()
> would result in a unevenly distributed sequence of characters...or?
> This would break the quality of ISAACs output.
>
> I am sure I did something wrong here...but where is the logic trap?

In theory, it doesn't matter if the number of random characters you
want is not multiple of the size of the set of characters. In
practice, it doesn't matter if the number of random characters you
want is big enough.

Consider the following modification to my program (I changed the order
of the array S so it's sorted and I can do binary search):

----------------------------------------------------------------------
#include <time.h>
#include <stdio.h>
#include <stdlib.h>

#define N (10+26+26)

static char S[] = {
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
        'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
};

static int
index_of_aux(char c, int a, int b)
{
        if (a > b)
                return -1;
        int m = (a + b) / 2;
        if (S[m] == c)
                return m;
        if (S[m] > c)
                return index_of_aux(c, a, m-1);
        return index_of_aux(c, m+1, b);
}

static int
index_of(char c)
{
        return index_of_aux(c, 0, N-1);
}

int
next_character()
{
        // Use the correct call for isaac instead of rand()
        unsigned int idx = rand() % N;
        return S[idx];
}

int
main(int argc, char* argv[])
{
        srand((unsigned int)time(NULL));
        int total = 1000000;

        // Size = 62
        int count[] = {
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
        };

        for (int i = 0; i < total; i++) {
                char c = next_character();
                count[index_of(c)]++;
        }

        int min = total, max = 0;
        for (int i = 0; i < N; i++) {
                printf("Count %d: %d\n", i, count[i]);
                min = (count[i] < min) ? count[i] : min;
                max = (count[i] > max) ? count[i] : max;
        }
        printf("Min: %d\n", min);
        printf("Max: %d\n", max);
        printf("Diff: %d\n", (max-min));
        return 0;
}
----------------------------------------------------------------------

The output is the following:

Count 0: 16060
Count 1: 16102
Count 2: 15968
Count 3: 16075
...
Count 59: 16220
Count 60: 16143
Count 61: 16177
Min: 15819
Max: 16366
Diff: 547

As you can see, all the characters appear  around 16,000 times, so
it's pretty well distributed. If the RNG has a good distribution, it
will *always* look something like this. The difference will *never* be
0, since no RNG is perfectly distributed.

So, TL;DR: it doesn't matter if total is not multiple of N.

Regards.
-- 
Canek Peláez Valdés
Profesor de asignatura, Facultad de Ciencias
Universidad Nacional Autónoma de México


  parent reply	other threads:[~2014-06-06 19:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06  2:56 [gentoo-user] OT: Mapping random numbers (PRNG) meino.cramer
2014-06-06  3:58 ` Canek Peláez Valdés
2014-06-06 18:39   ` meino.cramer
2014-06-06 19:37     ` null_ptr
2014-06-06 19:55     ` Canek Peláez Valdés [this message]
2014-06-06 21:05     ` Matti Nykyri
2014-06-06 21:03   ` Matti Nykyri
2014-06-07  9:19     ` Matti Nykyri
2014-06-07 12:13       ` Marc Joliet
2014-06-26 21:00     ` [gentoo-user] " Kai Krakow
2014-06-26 22:07       ` Kai Krakow
2014-06-27  8:55         ` thegeezer
2014-06-27 17:49           ` Matti Nykyri
2014-06-27 17:50           ` [gentoo-user] " Kai Krakow
2014-06-27 20:59             ` Neil Bothwick
2014-06-27 21:13       ` [gentoo-user] " Matti Nykyri
2014-06-28 13:08         ` Matti Nykyri
2014-06-28 21:28         ` [gentoo-user] " Kai Krakow
2014-06-28 22:06           ` Canek Peláez Valdés
2014-06-29  0:37             ` gottlieb
2014-06-29  0:53               ` Canek Peláez Valdés
2014-06-29  1:46                 ` [gentoo-user] " »Q«
2014-06-29  2:57                   ` Canek Peláez Valdés
2014-06-29  6:24           ` [gentoo-user] " Matti Nykyri
2014-06-29 12:38             ` [gentoo-user] " Kai Krakow
2014-06-30 23:47               ` Matti Nykyri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADPrc82ro80wVTzPeSc5GyWXpO=TF5SNf1b3NUXrCENaKUkx6A@mail.gmail.com' \
    --to=caneko@gmail.com \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox