From owner-octave-sources at bevo dot che dot wisc dot edu Fri Aug 16 16:56:45 1996 Subject: kernel.m From: Heber Farnsworth To: octave-sources at bevo dot che dot wisc dot edu Date: Fri, 16 Aug 1996 14:56:12 -0700 (PDT) This function either plots or gives the data points necessary to plot a nonparametric estimate of the probability density function of a set of data. A useful way to think of this is as a "smoothed" histogram. There is a well developed theory of kernel esimtators for density functions and the interested reader is referred to Silverman (1986) "Density Estimation for Statistics and Data Analysis" published by Chapman and Hall. Kernel estimators are consistent and do not rely on any parametric model for a density (ie normal, gamma, student's t, etc). Kernel estimation requires specification of a kernel and a bandwidth. Research has shown that the choice of kernel is not important. Here a Gaussian kernel is used. The bandwidth function I have chosen is widely used for estimating univariate density functions. The user may specify a smoothness parameter. To get optimal results this smoothness parameter should be chosen on the basis of cross-validation performed on your particular data set. No such functionallity is included here. Bottom line, if you want to get a feel for the distibution of your data use this function (unless you have very few data points in which case hist is probably more informative). ______________________________________________________________________________ function [xx, yy] = kernel(x, c, n) # usage: [xx, yy] = kernel (x, c, n) # # Kernel estimation of the probability density function based on a vector of # observed data. This version uses a Gaussian kernel and a bandwidth equal # to # # h(m) = c*std(x)*m^(.2) # # where std(x) is the standard deviation of the data and m is the number of # observations in x. The result can be thought of as a "smoothed" # histogram. # # Larger values of c lead to a smoother density function. (A more # sophisticated --and numerically intense-- program would choose c # optimally by cross-validation. # # Given a third argument, use that as the number of evenly spaced points # where the density will be evaluated. Default is 40, there is virtually # no gain to selecting numbers greater than 80. # # With two output arguments, produce the values xx and yy such that # plot(xx,yy) will plot the density function. # author: Heber Farnsworth # Dept of Finance, # Univeristy of Washington, # heberf at u dot washington dot edu if (nargin < 1 || nargin > 3) usage ("[xx, yy] = kernel (x, c, n)"); endif if nargin < 3 n = 40; elseif n <= 2 error ("kernel: number of points must be greater than 2"); endif if nargin < 2 c = 1; elseif c <=0 error ("kernel: second argument must be positive"); endif if (is_vector (x)) max_val = max (x); min_val = min (x); m = max(size(x)); if m > size(x,1) x = x'; endif else error ("kernel: first argument must be a vector"); endif s = std(x); h = c*s/(m^(.2)); z = linspace((min_val - .2*s),(max_val + .2*s),n); if size(z,1) > 1 z = z'; endif if m < 4000 u = (z(ones(m,1),:) - x(:,ones(1,n)))./h; K = (1/sqrt(2*pi))*exp(-.5*u.^2); mu = mean(K); f = mu./h; else for i = 1:n u = (z(i) - x)./h; K = (1/sqrt(2*pi))*exp(-.5*u.^2); mu = mean(K); f(i) = mu/h; endfor endif if (nargout == 2) yy = f; xx = z; else plot(z,f); endif endfunction _____________________________________________________________________________ Heber Farnsworth | Department of Finance Univerity of Washington | Box 353200 tele: (206) 528-0793 home | Seattle, WA 98195-3200 tele: (206) 543-4773 finance web: http://weber.u.washington.edu/~heberf fax: (206) 685-9392 email: heberf at u dot washington dot edu