From help-octave-request at bevo dot che dot wisc dot edu Wed Jan 27 15:31:20 1999 Subject: Principal Components Analysis From: "John W. Eaton" To: Gordon Haverland Cc: help-octave at bevo dot che dot wisc dot edu Date: Wed, 27 Jan 1999 15:31:02 -0600 (CST) On 27-Jan-1999, Gordon Haverland wrote: | Maybe someone here can help me. One of my users wants to | draw ellipses around the centroids of her clusters of data | points from Principal Components Analysis. I can force | everything to work as I expect, but I don't understand | some of the why of what I am doing. I have been reading | Numerical Recipes and Matrix Computation (3rd edition). | | So, PCA does an SVD of the data to find eigenvalues | and eigenvectors of the data set and we ignore | eigenvalues (and associated vectors) with values | less than 1. And the combination of eigenvalue and | eigenvector defines a hyper-ellipsoid. The eigenvalues are | equal to the square root of the variances in the rotated | coordinate system. | | The above is more or less definitions of PCA. When I go to | generate the ellipse(s), it turns out that I have to use the | square root of the eigenvalues in order to get ellipses of | the correct order of magnitude. This I don't understand. | Next, the ellipse for the vector x with 2-norm of 1 appears | to contain far more than 68% of the data points. This may be | due to the few points lying outside the ellipse being quite far | outside, but is still puzzling. Last, if I want to plot | ellipses of 75%, 90%, 95%, ..., what factors do I either | multiply the eigenvalues (square roots of the eigenvalues) | by (or the vector x)? | | The end result of this should be a octave script or function | which will take the data and do a 2D plot of the 2 most | significant components, along with the ellipse that goes | along with the data points. I'll gladly donate said script | to this archive. Perhaps the following code will be of some help. It will take a 2x2 matrix and generate a set of points for plotting an ellipse, its major and minor axes, and the bounding box that just encloses the ellipse. The 2x2 matrix is one that defines a quadratic form: [ x1 x2 ] [ a11 a12 ] [ x1 ] [ a21 a22 ] [ x2 ] = a11*x1^2 + (a12+a21)*x1*x2 + a22*x2^2 The level is the value of the this expression along the contour. jwe ## Plotting an ellipse with an aspect ratio not equal to one can be ## mighty confusing. gset size ratio 1; ## [x, y, major, minor, bbox] = ellipse (amat, level, n) ## ## Given a 2x2 matrix, generate ellipse data for plotting. function [x, y, major, minor, bbox] = ellipse (amat, level, n) if (nargin < 3) n = 100; endif if (nargin > 1) [v, l] = eig (amat / level); dl = diag(l); if (any (imag (dl)) || any (dl <= 0)) error ("ellipse: amat must be positive definite"); endif ## Generate contour data. a = 1 / sqrt (l(1,1)); b = 1 / sqrt (l(2,2)); t = linspace (0, 2*pi, n)'; xt = a * cos (t); yt = b * sin (t); ## Rotate the contours. ra = atan2 (v(2,1), v(1,1)); cos_ra = cos (ra); sin_ra = sin (ra); x = xt * cos_ra - yt * sin_ra; y = xt * sin_ra + yt * cos_ra; ## Endpoints of the major and minor axes. major = minor = (v * diag ([a, b]))'; major (2,:) = -major (1,:); minor (1,:) = -minor (2,:); ## Bounding box for the ellipse using magic formula. ainv = inv (amat); xbox = sqrt (level * ainv(1,1)); ybox = sqrt (level * ainv(2,2)); bbox = [xbox ybox; xbox -ybox; -xbox -ybox; -xbox ybox; xbox ybox]; else usage ("ellipse (amat, level, n)"); endif endfunction t = rand (100,2); amat = t' * t; level = 100; npts = 181; [x, y, major, minor, bbox] = ellipse (amat, level, npts); gset nokey plot (minor(:,1), minor(:,2), "- at r", major(:,1), major(:,2),"-@r", bbox(:,1), bbox(:,2), "-g", x, y, "-b");