From bug-request at octave dot org Fri Dec 16 14:59:27 2005 Subject: Re: kruskal_wallis_test() does not handle ties correctly From: Timo Juhani Lindfors To: "John W. Eaton" Cc: bug at octave dot org Date: Fri, 16 Dec 2005 14:18:50 -0600 --cmJC7u66zC7hs+87 Content-Type: multipart/mixed; boundary="HlL+5n6rz5pIUxbD" Content-Disposition: inline --HlL+5n6rz5pIUxbD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, On Thu, Dec 01, 2005 at 03:41:12PM -0500, John W. Eaton wrote: > On 3-Nov-2005, Timo Juhani Lindfors wrote: >=20 > | Sorry for the delay. I checked out the CVS version and tested my > | modifications with it. I think everything is ok but where should this > | runlength() be defined? In octave/scripts/general/runlength.m ? >=20 > That seems reasonable to me. Ok, the attached patch tries to fix statistics/tests/kruskal_wallis_test.m to handle ties correctly and adds runlength() that Paul Kienzle suggested to scripts/general/runlength.m=20 I set "Copyright Paul Kienzle" because I just copied Paul's suggestion from the mailing list and added example + assert. I'm not very familiar with octave so please let me know if I have forgotten something. best regards, Timo Lindfors --HlL+5n6rz5pIUxbD Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="kruskal_wallis_test_tiesfix.patch" Content-Transfer-Encoding: quoted-printable --- octave.orig/scripts/general/runlength.m 1970-01-01 02:00:00.000000000 += 0200 +++ octave/scripts/general/runlength.m 2005-12-16 22:04:57.000000000 +0200 at @ -0,0 +1,36 @@ +## Copyright (C) 2005 Paul Kienzle +## +## This file is part of Octave. +## +## Octave is free software; you can redistribute it and/or modify it +## under the terms of the GNU General Public License as published by +## the Free Software Foundation; either version 2, or (at your option) +## any later version. +## +## Octave is distributed in the hope that it will be useful, but +## WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +## General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with Octave; see the file COPYING. If not, write to the Free +## Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +## 02110-1301, USA. + +## -*- texinfo -*- +## at deftypefn {Function File} {} runlength (@var{x}) +## Find the lengths of all sequences of common values. Return the +## vector of lengths and the value that was repeated. +## +## at example +## runlength([2 2 0 4 4 4 0 1 1 1 1]) +## at end example +## at end deftypefn + +function [count,value] =3D runlength(x) + idx =3D [find(x(1:end-1) !=3D x(2:end)), length(x)]; + value =3D x(idx); + count =3D diff([0 idx]); +endfunction + +%!assert (runlength([2 2 0 4 4 4 0 1 1 1 1]), [2 1 3 1 4]); --- ./octave.orig/scripts/statistics/tests/kruskal_wallis_test.m 2005-08-23= 21:38:28.000000000 +0300 +++ ./octave/scripts/statistics/tests/kruskal_wallis_test.m 2005-12-16 20:4= 7:26.000000000 +0200 at @ -29,6 +29,18 @@ ## approximately chi-square with at var{df} =3D @var{k} - 1 degrees of ## freedom. ## +## If the data contains ties (some value appears more than once) +## at var{k} is divided by +##=20 +## 1 - at var{sumTies} / ( @var{n}^3 - @var{n} ) +## +## where at var{sumTies} is the sum of @var{t}^2 - @var{t} over each group +## of ties where at var{t} is the number of ties in the group and @var{n} +## is the total number of values in the input data. For more info on +## this adjustment see "Use of Ranks in One-Criterion Variance Analysis" +## in Journal of the American Statistical Association, Vol. 47, +## No. 260 (Dec 1952) by William H. Kruskal and W. Allen Wallis. +## ## The p-value (1 minus the CDF of this distribution at at var{k}) is ## returned in at var{pval} dot ## at @ -69,6 +81,11 @@ =20 n =3D length (p); k =3D 12 * k / (n * (n + 1)) - 3 * (n + 1); + + ## Adjust the result to takes ties into account. + sumTies =3D sum(polyval([1 0 -1 0],runlength(sort(p)))); + k =3D k / (1 - sumTies/(n^3 - n)); + df =3D m - 1; pval =3D 1 - chisquare_cdf (k, df); =20 at @ -78,4 +95,5 @@ =20 endfunction =20 - +## Test with ties +%!assert (abs(kruskal_wallis_test([86 86], [74]) - 0.157299207050285) < 0.= 0000000000001) --HlL+5n6rz5pIUxbD-- --cmJC7u66zC7hs+87 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQFDoyDhOpmT8yqc6uIRAo5NAKDrv+8sgrtB7NGgAPHmV96qsOa73wCg4BrV /guT6IIqMfKFjoY7/JsijJk= =ny4g -----END PGP SIGNATURE----- --cmJC7u66zC7hs+87-- ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------