From help-request at octave dot org Wed Oct 20 21:22:00 2004 Subject: Re: Curve Fitting "splattered" data From: "Henry F. Mollet" To: Rafael Laboissiere , "Robert A. Macy" CC: Date: Wed, 20 Oct 2004 19:17:26 -0700 Correction: GM regression is also know as *Reduced Major Axis Regression* not Minor Axis Regression. The Minor Axis goes with the Major axis of the correlation ellipse. The axis length ratio is given by sqrt (larger eigenvalue/smaller eigenvalue). I don't know why it's called reduced major axis regression. Summary: PCR apparently same as Major Axis "Regression" but Rafael's approach using svd (cov([x,y])) is more elegant. I also have two plot questions (5. and 6.). Henry 1. I have already posted regarding the Geometrical Mean (GM) regression. The GM-regression is also known as the Minor Axis Regression. It minimizes the "diagonal" distances of the data points to the line. It is suggested to be appropriate when the "error" of the x measurements are similar to the error of the y measurements. The y-on-x regression, which minimizes the sum of squares in the y-direction, assumes that there's no error for the data used on the x-axis. 2. PCA is apparently the same as the Major Axis "regression" and probably more appropriate here because your x and y are not functionally related. After all they are some numbers in the complex plane. 3. The Major Axis (of the correlation ellipse) should not be called a regression. I'm not exactly sure what is being minimized in geometrical terms but it minimizes the sum of the squares of the distances from the observed points to the line in a direction at right angles to the line, when one unit of measurement occupies the same absolute distance on the x and y coordinates. A mouthful and I don't know what it really means. 4. Here I'll calculate the slope of the major axis for comparison *** local user variables: prot type rows cols name ==== ==== ==== ==== ==== rwd matrix 37 1 x rwd matrix 37 1 y octave:10> b=cov(x,y)/var(x) b = -15.309 octave:11> d= cov(x,y)/var(y) d = -0.022839 octave:12> 1/d ans = -43.785 octave:13> r = cov(x,y)/sqrt(var(x)*var(y)) r = -0.59131 octave:14> b_GM = b/r b_GM = 25.891 octave:15> M = [var(x),cov(x,y);cov(x,y),var(y)] % same as cov(d) M = 1.8317e-10 -2.8042e-09 -2.8042e-09 1.2278e-07 octave:16> eig(M) ans = 1.1906e-10 1.2285e-07 % best to think of these eigenvalues as variances. % same as svd(cov([x,y])) given by Rafael octave:18> la2=max(eig(M)) la2 = 1.2285e-07 octave:19> b_MA= cov(x,y)/(la2-var(y)) b_MA = -43.743 % It appears that Major Axis slope is close to 1/d = -43.785 (reciprocal of x-on-y slope. Probably has something to do with fact that correlation ellipse is so close to the y-xis (theta=1.31 degrees)? octave:20> tan2theta=2*cov(x,y)/(var(x)-var(y)) tan2theta = 0.045746 octave:21> theta=atan(tan2theta)/2 theta = 0.022857 octave:22> thetadegress=theta*180/pi thetadegress = 1.3096 ********* 5. I had to resort to Excel to be able to plot the data using the *same* axis range for both x and y, so that I could see the shape of the correlation ellipse. Apparently gnuplot considered the range of x to be empty: octave:5> plot (x,y,"x") % no problem here octave:6> max(x) ans = 1.9826e-05 octave:7> min(y) ans = -4.1757e-04 octave:8> max(y) ans = 0.00091074 octave:9> min (y) ans = -4.1757e-04 octave:10> axis ([0.001,0.001,0.001,0.001], "square") octave:11> plot (x,y,"x") gnuplot> pl '/var/tmp/oct-kUoi1a' t "line 1" w points 1 4 ^ line 0: Can't plot with an empty x range! ******* 5. I also had no luck with Rafael's plot: octave:7> [u,v,w] = svd (cov (d)); octave:8> m = mean (d); octave:9> r = min (d (:, 1)); octave:10> s = max (d (:, 1)); octave:11> plot (d (:, 1), d (:, 2), '*') octave:12> hold on octave:13> plot ([r, s], m (2) + [(r-m(1)),(s-m(1))] * u(2) / u(1)); error: single index only valid for row or column vector error: evaluating binary operator `*' near line 13, column 43 error: evaluating binary operator `/' near line 13, column 50 error: evaluating binary operator `+' near line 13, column 21 error: evaluating argument list element number 2 ******* on 10/18/04 2:07 AM, Rafael Laboissiere at rlaboiss at users dot sourceforge dot net wrote: > * Robert A. Macy [2004-10-17 23:36]: > >> Trying to fit a curve to data with poor results. >> >> Data is a set of 37 complex data points that roughly lie >> along a straight line in the complex plane. Sequence of >> data has no significance, only their location on the plane. >> >> Don't care about intercept point, only the slope. Ran >> polyfit.m using real(datapoints) and imag(datapoints) >> thinking that would yield slope of trend, for example, >> >> p=polyfit(real(datapoints),imag(datapoints),1) >> >> calculates a slope, but when I reverse the order >> >> p=polyfit(imag(datapoints),real(datapoints),1) >> >> I don't get a reciprocal slope?! >> >> one way I get 15.093, the other way I get 0.0228387 >> Why aren't they reciprocal? > > Because in the first case you minimize the sum of squared errors in > imag(datapoints) and in the second case in real(datapoints). This is the > expected behavior of polyfit (see "help polyfit"). > >> Is there a better program for finding the straight line? > > You might do a PCA (principal component analysis) on your data, which boils > down to using either eig or svd. Try this: > > d = [real(datapoints), imag(datapoints)]; > [u,v,w] = svd (cov (d)); > m = mean (d); > r = min (d (:, 1)); > s = max (d (:, 1)); > hold off > plot (d (:, 1), d (:, 2), '*') > hold on > plot ([r, s], m (2) + [(r-m(1)),(s-m(1))] * u(2) / u(1)); > > The slope of the curve plotted is close to -0.022861. ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------