From octave-maintainers-request at bevo dot che dot wisc dot edu Fri Jan 30 15:40:38 2004 Subject: Re: [Fwd: Re: rfftw slower than fftw for "bad" size arrays] From: "Dmitri A. Sergatskov" To: David Bateman Cc: octave-maintainers mailing list Date: Fri, 30 Jan 2004 14:40:09 -0700 David Bateman wrote: > Ok, I've port Octave to also use FFTW 3.0.1, but I'm getting some odd results. > I attach a patch, and some rewritten test programs and some comparisons I've > done. Basically, in many cases I'm faster than both the old octave and matlab, > but there are some cases where it is slower. I'm really not sure what is the > issue, so any clues would be of assistance. > I have not tried the patch yet, but I run benchmarks from fftw2 and fftw3 packages(see below). It seems this problem is the fftw3 "feature" -- when run with ESTIMATE flag it does it slower (sometimes) than fftw2. It is not quite clear to me from the docs if using "wisdom" created with more "patient" flags helps when you use ESTIMATE flag. It is definitely possible: http://www.fftw.org/fftw3_doc/Words-of-Wisdom-Saving-Plans.html#Words%20of%20Wisdom-Saving%20Plans <<<< Wisdom is automatically used for any size to which it is applicable, as long as the planner flags are not more "patient" than those with which the wisdom was created. For example, wisdom created with FFTW_MEASURE can be used if you later plan with FFTW_ESTIMATE or FFTW_MEASURE, but not with FFTW_PATIENT. >>>> One possibility is to have say "make wisdom_measure" rule in octave makefile which will create wisdom file (it will take a while, so it should not be in default make, I would guess). > Cheers > David > Regards, Dmitri. p.s.: Here is some benchmark numbers (forward/inplace xform): fftw2 (complex 512x512): SPEED TEST: 512x512, FFTW_FORWARD, in place, generic (That is what Octave would use) time for one fft: 147.481125 ms (562.595844 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 159.972742 SPEED TEST: 512x512, FFTW_FORWARD, in place, specific time for one fft: 48.231141 ms (183.987200 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 489.164463 fftw3 (complex 512x512): [dima at tumbleweed tests]$ ./bench -oestimate -s 512x512 Problem: 512x512, setup: 342.00 us, time: 229.17 ms, ``mflops'': 102.95 (XXXXX) [dima at tumbleweed tests]$ ./bench -s 512x512 (!! This is using default MEASURE flag !!) Problem: 512x512, setup: 827.87 ms, time: 96.72 ms, ``mflops'': 243.94 [dima at tumbleweed tests]$ ./bench -oexhaustive -s 512x512 Problem: 512x512, setup: 340.86 s, time: 47.56 ms, ``mflops'': 496.06 [dima at tumbleweed tests]$ ./bench -opatient -s 512x512 Problem: 512x512, setup: 33.65 s, time: 50.51 ms, ``mflops'': 467.06 ============ fftw2 (real 521x521 <-- prime size): SPEED TEST: 521x521, FFTW_FORWARD, in place, generic time for one fft: 444.606000 ms (1.631683 us/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 27.664384 (specific gives the same results) fftw3: [dima at tumbleweed tests]$ ./bench -s r521x521 Problem: r521x521, setup: 1.08 s, time: 75.10 ms, ``mflops'': 163.1 [dima at tumbleweed tests]$ ./bench -oestimate -s r521x521 Problem: r521x521, setup: 1.09 ms, time: 81.23 ms, ``mflops'': 150.8 [dima at tumbleweed tests]$ ./bench -oexhaustive -s r521x521 Problem: r521x521, setup: 3.08 s, time: 71.80 ms, ``mflops'': 170.59