public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Matti Nykyri <Matti.Nykyri@iki.fi>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] OT: Mapping random numbers (PRNG)
Date: Sat, 7 Jun 2014 00:03:29 +0300	[thread overview]
Message-ID: <20140606210329.GA1631@lyseo.edu.ouka.fi> (raw)
In-Reply-To: <CADPrc81Uz0rRcvscWSL-T57RBFHOru-oMrQ94oDaFToLyYrujA@mail.gmail.com>

On Thu, Jun 05, 2014 at 10:58:51PM -0500, Canek Peláez Valdés wrote:
> On Thu, Jun 5, 2014 at 9:56 PM,  <meino.cramer@gmx.de> wrote:
> > Hi,
> >
> > I am experimenting with the C code of the ISAAC pseudo random number generator
> > (http://burtleburtle.net/bob/rand/isaacafa.html).
> >
> > Currently the implementation creates (on my embedded linux) 32 bit
> > hexadecimal output.
> 
> So it's a 32 bit integer.
> 
> > From this I want to create random numbers in the range of [a-Za-z0-9]
> > *without violating randomness* and (if possible) without throwing
> > away bits of the output.
> 
> You mean *characters* int the range [A-Za-z0-9]?

Well this isn't as simple problem as it sounds. A random 32 bit integer 
has 32 bits of randomness. If you take a divison reminder of 62 from this 
integer you will get only 5,95419631039 bits of randomness 
(log(62)/log(2)). So you are wasting 81,4% of your random data. Which is 
quite much and usually random data is quite expensive. You can save your 
precious random data by taking only 6 bit from your 32 bit integer and 
dividing it by 62. Then you will be wasting only 0,8% of random data. 
Another problem is alignment, but that is about mathematical correctness.

> > How can I do this mathemtically (in concern of the quality of output)
> > correct?
> 
> The easiest thing to do would be:

The easiest is not mathematically correct though. Random data will stay 
random only if you select and modify it so that randomness is preserved. 
If you take devison reminder of 62 from 32 bit integer there are 69 273 
667 possibilities of the reminder to be 3 or less. For the reminder to 4 
or more the number of possibilities is 69 273 666. In mathematically 
ideal case the probability for every index of the list should be same: 
1/62 = 1,61290322581%. But the modulo 62 modifies this probability: for 
index 0-3 the probability is 69 273 667/2^32 = 1,61290324759%. And for 
indexes 4-61 the probability will be 69 273 666/2^32 = 1,6129032243%.

If you wish not to waste those random bits the probabilities will get 
worse. With 6 bits of random the probability for index 0-1 will be 2/64 
and for 2-63 it will be 1/64. This is a very significant change because 
first and second index will appear twice as much as the rest. If you add 
2 characters to your list you will perfect alignment and you can take 6 
bits of data without it modifying probabilities.

If you are looking a mathematically perfect solution there is a simple 
one even if your list is not in the power of 2! Take 6 bits at a time of 
the random data. If the result is 62 or 63 you will discard the data and 
get the next 6 bits. This selectively modifies the random data but keeps 
the probabilities in correct balance. Now the probability for index of 
0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.

> -------------------------------------------------------------------------------
> #include <time.h>
> #include <stdio.h>
> #include <stdlib.h>
> 
> #define N (26+26+10)
> 
> static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
> 'K', 'L', 'M',
>                     'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
> 'X', 'Y', 'Z',
>                     'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
> 'k', 'l', 'm',
>                     'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
> 'x', 'y', 'z',
>                     '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
> 
> int
> next_character()
> {
>         // Use the correct call for ISAAC instead of rand()
>         unsigned int idx = rand() % N;
>         return S[idx];
> }

so modify the next_char function:

char next_character()
{
	static unsigned int rand = 0; //(sizeof(int) = 32)
	static char bit_avail = 0;
	char result = 0;
	char move_bits = 0;
	char bits_moved = 0;

	do {
		if (!bits_avail) {
        		// Use the correct call for ISAAC instead of rand()
			rand = rand();
			
			bit_avail = 32;
		}

		move_bits = bits_avail >= 6 ? 6 : bits_avail;
		result <<= move_bits;
		result = (result | rand & (0xFF >> (8 - move_bits))) & 0x3F;
		bits_avail -= move_bits;
		bits_moved += move_bits;
		rand >>= move_bits;

	} while (bits_moved != 6 && result > 61);

	return result;
}

This function will give perfect distribution of 1/62 probability for 
every index. It will waste 6 bits with the probability of 1/32 (2/64).

> int
> main(int argc, char* argv[])
> {
>         // Use the correct call for initializing the ISAAC seed
>         srand((unsigned int)time(NULL));
>         for (int i = 0; i < 20; i++) // --std=c99
>                 printf("%c\n", next_character());
>         return 0;
> }
> -------------------------------------------------------------------------------
> 
> If the ISAAC RNG has a good distribution, then the next_character()
> function will give a good distribution among the set [A-Za-z0-9].
> 
> Unless I missunderstood what you meant with "create random numbers in
> the range of [a-Za-z0-9]".

-- 
-Matti


  parent reply	other threads:[~2014-06-06 21:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06  2:56 [gentoo-user] OT: Mapping random numbers (PRNG) meino.cramer
2014-06-06  3:58 ` Canek Peláez Valdés
2014-06-06 18:39   ` meino.cramer
2014-06-06 19:37     ` null_ptr
2014-06-06 19:55     ` Canek Peláez Valdés
2014-06-06 21:05     ` Matti Nykyri
2014-06-06 21:03   ` Matti Nykyri [this message]
2014-06-07  9:19     ` Matti Nykyri
2014-06-07 12:13       ` Marc Joliet
2014-06-26 21:00     ` [gentoo-user] " Kai Krakow
2014-06-26 22:07       ` Kai Krakow
2014-06-27  8:55         ` thegeezer
2014-06-27 17:49           ` Matti Nykyri
2014-06-27 17:50           ` [gentoo-user] " Kai Krakow
2014-06-27 20:59             ` Neil Bothwick
2014-06-27 21:13       ` [gentoo-user] " Matti Nykyri
2014-06-28 13:08         ` Matti Nykyri
2014-06-28 21:28         ` [gentoo-user] " Kai Krakow
2014-06-28 22:06           ` Canek Peláez Valdés
2014-06-29  0:37             ` gottlieb
2014-06-29  0:53               ` Canek Peláez Valdés
2014-06-29  1:46                 ` [gentoo-user] " »Q«
2014-06-29  2:57                   ` Canek Peláez Valdés
2014-06-29  6:24           ` [gentoo-user] " Matti Nykyri
2014-06-29 12:38             ` [gentoo-user] " Kai Krakow
2014-06-30 23:47               ` Matti Nykyri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140606210329.GA1631@lyseo.edu.ouka.fi \
    --to=matti.nykyri@iki.fi \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox