From help-octave-request at bevo dot che dot wisc dot edu Wed Jan 27 15:05:39 1999 Subject: Principal Components Analysis From: Gordon Haverland To: help-octave at bevo dot che dot wisc dot edu Date: Wed, 27 Jan 1999 14:04:45 -0700 Hi! Maybe someone here can help me. One of my users wants to draw ellipses around the centroids of her clusters of data points from Principal Components Analysis. I can force everything to work as I expect, but I don't understand some of the why of what I am doing. I have been reading Numerical Recipes and Matrix Computation (3rd edition). So, PCA does an SVD of the data to find eigenvalues and eigenvectors of the data set and we ignore eigenvalues (and associated vectors) with values less than 1. And the combination of eigenvalue and eigenvector defines a hyper-ellipsoid. The eigenvalues are equal to the square root of the variances in the rotated coordinate system. The above is more or less definitions of PCA. When I go to generate the ellipse(s), it turns out that I have to use the square root of the eigenvalues in order to get ellipses of the correct order of magnitude. This I don't understand. Next, the ellipse for the vector x with 2-norm of 1 appears to contain far more than 68% of the data points. This may be due to the few points lying outside the ellipse being quite far outside, but is still puzzling. Last, if I want to plot ellipses of 75%, 90%, 95%, ..., what factors do I either multiply the eigenvalues (square roots of the eigenvalues) by (or the vector x)? The end result of this should be a octave script or function which will take the data and do a 2D plot of the 2 most significant components, along with the ellipse that goes along with the data points. I'll gladly donate said script to this archive. Thanks for any light you might shed on this. Gordon Haverland haverlan at agric dot gov dot ab dot ca