From help-octave-request at bevo dot che dot wisc dot edu Wed Jan 21 16:38:22 2004 Subject: Re: Octave and threaded ATLAS and FFTW From: "Dmitri A. Sergatskov" To: "John W. Eaton" Cc: help-octave Date: Wed, 21 Jan 2004 15:37:02 -0700 Here is some experience with Octave 2.1.50 compiled with thread-enabled ATLAS on 2xAthlonMP 2000 MHz. Linux Fedoras Core 1. To make long story short -- it does seems to help with some matrix manipulations (most notably multiplication). The gory details are following: I used ATLAS 2.6.0 and compiled it (without using default parameters) on runlevel 1 (took some 5 hours). The binary available at http://coffee.phys.unm.edu/dima/octave/Linux_ATHLONSSE1_2_das.tgz I could not get configure to pickup libptcblas instead of libcblas, so after running ./configure --enable-shared --enable-dl --disable-static I manually modified Makeconf orig: BLAS_LIBS = -lcblas -lf77blas -latlas changed to: BLAS_LIBS = -lptcblas -lptf77blas -latlas orgig: LIBS = -lreadline -lncurses -ldl -lm changed to: LIBS = -lreadline -lncurses -ldl -lpthread -lm It compiled fine. The only problem I found so far is 'cputime' start returning rediculously small numbers (I reported it before in this thread). I verify with a stopwatch that tic/toc returns correct numbers so I used it for benchmarking. I started with Octave2.m benchmark from www.sciviews.org and slightly modified it. I removed rand() from the first benchmark and increase matrix sizes to make it size large than my cache (256k) and increase the execution time to a few seconds. (http://coffee.phys.unm.edu/dima/octave/Octave2l.m) The relevant numbers (full benchmark results are in the file http://coffee.phys.unm.edu/dima/octave/bench2cpu.txt) are below. The first column (Pthread on 2CPU) is the result of octave linked with threaded ATLAS running on SMP kernel. The second column (Pthread on 1CPU) is the same octave running on the same computer booted into UniProcessor kernel. The third column ("Normal" on 2CPU) is octave linked to the normal ATLAS running on the same computer with SMP kernel. Pthread on 2CPU Pthread on 1CPU "Normal" on 2CPU transp., deformation of a 3000x3000 matrix (sec): 2.152 2.188 2.306 3000x3000 normal distributed random matrix ^1000 (sec): 1.401 1.395 1.409 Sorting of 2,000,000 random values (sec): 7.904 8.049 7.831 3000x3000 cross-product matrix (b = a' * a)(sec): 10.31 19.26 18.19 Linear regression over a 3000x3000 matrix (c = a \ b') (sec): 9.142 13.15 11.94 FFT over 800,000 random values (sec): 0.3907 0.4066 0.3924 Eigenvalues of a 500x500 random matrix (sec): 7.386 7.386 7.215 Determinant of a 2000x2000 random matrix (sec): 3.225 4.279 3.93 Cholesky decomposition of a 3000x3000 matrix (sec): 3.353 5.025 4.716 Inverse of a 2000x2000 random matrix (sec): 7.336 10.37 9.579 Hope it of some interest. Sincerely, Dmitri. p.s.: I looked into using threaded FFTW and post some thoughts lately. ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------