From help-octave-request at bevo dot che dot wisc dot edu Tue Jun 18 10:25:16 2002 Subject: hist(y,n) [was Re: I NEED HELP ...] From: Paul Kienzle To: help-octave at bevo dot che dot wisc dot edu Date: Tue, 18 Jun 2002 11:24:41 -0400 On Tue, Jun 18, 2002 at 02:24:59AM -0400, Paul Kienzle wrote: > On Fri, Jun 07, 2002 at 02:25:17PM -0500, Marco Boni wrote: > For example, in hist: > > for i = 1:n-1 > cutoff (i) = (x (i) + x (i + 1)) / 2; > endfor > > is simply translated into > > cutoff = ( x(1:n-1) + x(2:n) ) / 2; > > The next loop in hist.m is more tricky: > > freq(1) = sum(y < cutoff (1)); > for i = 2:n-1 > freq (i) = sum (y >= cutoff (i-1) & y < cutoff (i)); > endfor > freq(n) = sum (y >= cutoff (n-1)); > > You can get part way there with sort: > > [s, idx] = sort ( [cutoff(:); y(:)] ); > pt = [0; idx > n; 0]; > > This gives you a sequence of 0's and 1's where the zeros represent > bin boundaries and the ones represent bin contents. Because sort > is 'stable' all y values which are identical to an x value will > appear after the x value, so the semi-open [x-1, x) logic will > be preserved. The leading and trailing zeros capture all the y's > which are not otherwise in a bin. > > Using cumulative sum, you can turn the ones into frequency counts > > chist = cumsum(pt); > > But you only need the counts at the bin boundaries: > > chist = s(find(diff(chist) == 0)); > > Now you have a 'cumulative histogram'. Differentiate it and you > should have the histogram: > > freq = diff(chist); Oops! This will need to be: freq = [chist(1); diff(chist) ] > > I may messed up something along the way, but hopefully you get > the idea. Could you fix it, test it and submit it to bug-octave? > > Since you are dealing with really large numbers of bins (up to 2^20 if > I'm reading your code correctly), fixing hist should address most of > the speed problems. I was bothered by having to compute with all those empty bins. Here is a version for n equal sized bins which is independent of the number of bins: bins = zeros(1,n); q = sort(y(:).'); L = length(q); if (L == 0) return; elseif (q(1) == q(L)) bins(n) = L; return; endif q = (q - q(1))/(q(L)-q(1))/(1+eps); # set y-range to [0,1) q = fix(q*n); # split into n bins same = [ q == [q(2:L),-Inf] ]; # true if neighbours are in the same bin q = q(~same); # q lists the 'active' bins (0-origin) f = cumsum(same); # cumulative histogram f = f(~same); f = [f(1), diff(f)] + 1; # cumulative histogram -> histogram # we need to add 1 since we did not count the point at the boundary between # histograms (it was turned into zero) # distribute f to the active bins, leaving the remaining bins empty bins(q+1) = f; Paul Kienzle pkienzle at users dot sf dot net ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------