From help-request at octave dot org Wed Jul 28 00:58:35 2004 Subject: Re: problem with rand ? From: Paul Kienzle To: "Dmitri A. Sergatskov" Cc: help-octave at octave dot org, bug@octave.org Date: Wed, 28 Jul 2004 01:57:23 -0400 Dimitri, The first thing to note is that I'm only using 32 bits of randomness, not 53. I was wondering if that might be a problem. Apparently it is. With a slight change to the code (using rand53() rather than randu()) the following expression usually equals 0: length(find(diff(sort(rand(1e6,1)))==0)) Calculating the probability of at least one duplicate is the birthday paradox: 1-p = m!/(m^n (m-n)!) Using Stirling's approximation, log(1-p) ~ (m-n+1/2)(log m - log m-n) - n For m=2^32, n=1e6, 1-p ~ 2.7e-51 => very high probability of duplicates. For m=2^53, n=1e6, 1-p ~ 0.99983 => very low probability of duplicates. For m=2^53, n=6e7, 1-p ~ 0.55 => even chance of a duplicate. For m=2^32, n=7.7e4, 1-p ~ 0.50 => even chances of a duplicate. Unfortunately, my computer is not hefty enough to run this experiment for n=6e7 (it takes 15 min/trial), but running 100 trials on n=7.7e4 yields a not unexpected 54 trials with no duplicates. FWIW, the n=6e7 trial I did run yielded 0 duplicates. The 53-bit code is some 35% slower than the 32-bit code. Unless there are objections, I will change to 53-bits. Accuracy over speed. - Paul On Jul 27, 2004, at 11:46 PM, Dmitri A. Sergatskov wrote: > I found that random series returned by rand (from octave-forge) returns > a higher number of _identical_ values than I would expect: > > octave:1> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); > any(dss1==0) > ans = 1 > octave:2> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); > find(dss1==0) > ans = [](1x0) > octave:3> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); > find(dss1==0) > ans = 60641 > octave:4> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); > find(dss1==0) > ans = [](1x0) > octave:5> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); > find(dss1==0) > ans = > > 26998 27604 > > octave:6> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); > find(dss1==0) > ans = 16034 > octave:7> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); > find(dss1==0) > ans = 23032 > octave:8> s1=rand(1,128*1024); ss1=sort(s1); dss1=diff(ss1); > find(dss1==0) > ans = 121716 > > etc... > > I would think that for random 2^17 numbers probability of two identical > numbers would be something like 2^17 / 2^54 (assuming ieee 64 bit > double > representation) > > Matlab does not show this behavior even for larger series: > > >> s1=rand(1,256*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0) > > ans = > > 0 > > >> s1=rand(1,1024*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0) > > ans = > > 0 > > >> s1=rand(1,1024*1024); ss1=sort(s1); dss1=diff(ss1); any(dss1==0) > > ans = > > 0 > > Am I missing something here? > > Sincerely, > > Dmitri. > > > > ------------------------------------------------------------- > Octave is freely available under the terms of the GNU GPL. > > Octave's home on the web: http://www.octave.org > How to fund new projects: http://www.octave.org/funding.html > Subscription information: http://www.octave.org/archive.html > ------------------------------------------------------------- > ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------