From owner-help-octave at bevo dot che dot wisc dot edu Sat Feb 22 12:16:24 1997 Subject: Shuffling elements in a dataset (fwd) From: Ted dot Harding at nessie dot mcc dot ac dot uk (Ted Harding) To: help-octave at bevo dot che dot wisc dot edu Date: Sat, 22 Feb 1997 15:44:40 +0000 (GMT) ( Re Message From: John W. Eaton ) > > On 21-Feb-1997, Ted Harding wrote: > > | I find that something like > | > | [dummy,ix] = sort(rand(1,rows(x))); new_x = x(ix,:); > | > | seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows, > | or for 1000000, on a 386-DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs > | for 100000 rows, or for 1000000, on Pentium-120, i.e. almost independent > | of number of rows. However for 10000000 rows it starts swapping and takes > | a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro > | rata for extra columns (RAM limit). > > Hmm. Here is what I get (counting down to minimize memory problems): > > for n = [[8, 6, 4, 2]*1e5, 10.^[5, 4, 3, 2, 1]] > x = rand (n, 1); > t = cputime (); > [dummy,ix] = sort(rand(1,rows(x))); > new_x = x(ix,:); > t = cputime () - t; > printf ("N = %8d, t = %6.2f seconds, n*log(n) = %.2e\n", n, t, n*log(n)); > fflush (stdout); > endfor > > N = 500000, t = 13.96 seconds > N = 100000, t = 2.30 seconds > N = 10000, t = 0.17 seconds > N = 1000, t = 0.02 seconds > N = 100, t = 0.01 seconds > N = 10, t = 0.01 seconds > > at N == 10^6, I get into paging trouble on my 64MB Pentium system. > > How did you end up with almost constant time? I took Octave's sort > algorithm from Knuth. It's good, but I don't think it's *that* good. > > jwe Eerrrmm OOPS. Turns out I inadvertently did it on a row vector instead of a column vector. Too late at night. Anyway, it's next day now, and here's some typical results (Pentium 120, 48MB RAM). > x=(1:1000)' ; tic; [x0,ix]=sort(rand(1,rows(x))); x(ix); toc ans = 0.049538 > x=(1:10000)' ; tic; [x0,ix]=sort(rand(1,rows(x))); x(ix); toc ans = 0.74168 > x=(1:100000)'; tic; [x0,ix]=sort(rand(1,rows(x))); x(ix); toc ans = 13.884 Very comparable. I still reckon it's pretty fast! (especially since there's a lot else happening on the machine at the moment). Best wishes, Ted. (Ted dot Harding at nessie dot mcc dot ac dot uk)