From octave-maintainers-request at bevo dot che dot wisc dot edu Wed Jan 28 14:15:46 2004 Subject: Re: A faster FFT of real matrices ?? From: "Dmitri A. Sergatskov" To: David Bateman Cc: octave-maintainers at bevo dot che dot wisc dot edu Date: Wed, 28 Jan 2004 13:14:06 -0700 It appears to be fftw problem. First of all here is my results: octave:5> testfft2 Loading Data done Testing fft( 512, 512) 2.46e-02 sec (4.51e-01) rerr 2.19e-13 Testing fft2( 512, 512) 4.52e-02 sec (2.64e-01) rerr 1.83e-13 Testing fft( 513, 513) 5.20e-02 sec (5.78e-01) rerr 4.07e-13 Testing fft2( 513, 513) 9.10e-02 sec (7.07e-01) rerr 9.38e-14 Testing fft( 514, 512) 2.20e-01 sec (2.57e+00) rerr 1.54e-13 Testing fft2( 514, 512) 2.41e-01 sec (2.68e+00) rerr 1.35e-12 Testing fft( 512, 514) 2.47e-02 sec (4.49e-01) rerr 1.13e-13 Testing fft2( 512, 514) 6.34e-02 sec (3.65e-01) rerr 3.39e-13 Testing fft(65536, 1) 2.64e-02 sec (7.76e-01) rerr 1.37e-13 Testing fft2(65536, 1) 2.67e-02 sec (7.42e-01) rerr 1.37e-13 ===== If you build fftw it will make few test programs. Running them I get the following: [dima at tumbleweed tests]$ ./fftw_test -s 514x514 Please wait (and remember, this is faster than Java). SPEED TEST: 514x514, FFTW_FORWARD, in place, generic time for one fft: 97.128188 ms (367.636859 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 244.959784 SPEED TEST: 514x514, FFTW_BACKWARD, in place, generic time for one fft: 97.292187 ms (368.257610 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 244.546869 SPEED TEST: 514x514, FFTW_FORWARD, in place, specific time for one fft: 95.786844 ms (362.559780 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 248.390060 SPEED TEST: 514x514, FFTW_BACKWARD, in place, specific time for one fft: 96.811188 ms (366.436992 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 245.761884 [dima at tumbleweed tests]$ ./rfftw_test -s 514x514 Please wait (and remember, this is faster than Java). SPEED TEST: 514x514, FFTW_FORWARD, in place, generic time for one fft: 228.993500 ms (863.396601 ns/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 52.168528 SPEED TEST: 514x514, FFTW_BACKWARD, in place, generic time for one fft: 226.757937 ms (854.967641 ns/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 52.682847 SPEED TEST: 514x514, FFTW_FORWARD, in place, specific time for one fft: 229.113437 ms (863.848813 ns/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 52.141218 SPEED TEST: 514x514, FFTW_BACKWARD, in place, specific time for one fft: 226.630375 ms (854.486679 ns/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 52.712501 ======= Now 512x512 for comparison ======= [dima at tumbleweed tests]$ ./rfftw_test -s 512x512 Please wait (while Windows NT reboots). SPEED TEST: 512x512, FFTW_FORWARD, in place, generic time for one fft: 16.340711 ms (62.092317 ns/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 724.953801 SPEED TEST: 512x512, FFTW_BACKWARD, in place, generic time for one fft: 16.710453 ms (63.497284 ns/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 708.913182 SPEED TEST: 512x512, FFTW_FORWARD, in place, specific time for one fft: 16.597938 ms (63.069741 ns/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 713.718828 SPEED TEST: 512x512, FFTW_BACKWARD, in place, specific time for one fft: 16.689492 ms (63.417635 ns/point) "mflops" = 5/2 (N log2 N) / (t in microseconds) = 709.803532 [dima at tumbleweed tests]$ ./fftw_test -s 512x512 Please wait (exorcising evil spirits). SPEED TEST: 512x512, FFTW_FORWARD, in place, generic time for one fft: 146.983063 ms (560.695887 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 160.514821 SPEED TEST: 512x512, FFTW_BACKWARD, in place, generic time for one fft: 147.316875 ms (561.969280 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 160.151103 SPEED TEST: 512x512, FFTW_FORWARD, in place, specific time for one fft: 48.528656 ms (185.122132 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 486.165532 SPEED TEST: 512x512, FFTW_BACKWARD, in place, specific time for one fft: 47.631297 ms (181.698978 ns/point) "mflops" = 5 (N log2 N) / (t in microseconds) = 495.324745