From bug-octave-request at bevo dot che dot wisc dot edu Tue Feb 10 13:37:10 2004 Subject: Re: anova.m From: Andy Adler To: toni saarela cc: bug-octave at bevo dot che dot wisc dot edu Date: Tue, 10 Feb 2004 14:36:45 -0500 (EST) The anovan.m code in octave-forge has the correct result in this case. > y = [1 3 4 2 1 5 3 5 6 7 4 5 7 10 11 3]'; > g = [1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3]'; > anovan(y, g) 1-way ANOVA Table (Factors A,): Source of Variation Sum Sqr df MeanSS Fval p-value ********************************************************************* Error 62.80 13 4.83 Factor A 61.64 2 30.82 6.380 0.011737 > anova(y, g) One-way ANOVA Table: Source of Variation Sum of Squares df Empirical Var ********************************************************* Between Groups 71.5600 2 35.7800 Within Groups 62.8000 13 4.8308 --------------------------------------------------------- Total 134.3600 15 Test Statistic f 7.4067 p-value 0.0071 (Aside: I'm still looking for testers of my anonan code. It's quite complex, and I suspect there are still bugs) Andy On Tue, 10 Feb 2004, toni saarela wrote: > Version: Octave 2.1.50 (i686-pc-linux-gnu) > > Description: > > I think there's a small bug in anova.m (which performs one-way analysis > of variance). It only occurs when using anova with two input arguments, > as in > > octave:1> anova (y,g) > > ,where y is a vector containing the data and g is a vector defining the > groups, and only with unequal group sizes. > > The total mean is calculated from the group means (see below). This > works fine if the group sizes are equal. However, if they are not, it > gives too much weight to smaller groups in calculation of total mean, > sometimes leading to too high estimates of between-groups variance (and > of total variance), and thus too high F- and too small p-values. > > Example: > > octave:1>y = [1 3 4 2 1 5 3 5 6 7 4 5 7 10 11 3]'; > octave:2>g = [1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3]'; > octave:3>anova (y, g) > > gives F=7.4067, p=0.0071 (ssq between groups = 71.5600) > > should be (please correct me if I'm wrong): F=6.3797, p=0.0117 (ssq > between groups = 61.6375) > > Fix: > --- > > Simply replacing the vector group_mean with y (input vector containing > all the data) in calculation of total_mean on line 83 should fix it: > > line 83: > total_mean = mean (group_mean); > > to: > total_mean = mean (y); > > Now the SSQ's produce the right result: (lines 84-86) > SSB = sum (group_count .* (group_mean - total_mean) .^ 2); > SST = sumsq (reshape (y, n, 1) - total_mean); > SSW = SST - SSB; > > (Or if group_mean is to be used, it should be weighted with relative > group sizes) > > Best regards, > Toni Saarela > > > > > ------------------------------------------------------------- > Octave is freely available under the terms of the GNU GPL. > > Octave's home on the web: http://www.octave.org > How to fund new projects: http://www.octave.org/funding.html > Subscription information: http://www.octave.org/archive.html > ------------------------------------------------------------- > > ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------