From maintainers-request at octave dot org Thu Nov 11 23:25:17 2004 Subject: Re: gcc 3.4 and Octave/lapack problems From: Jskud at Jskud dot com To: jwe at bevo dot che dot wisc dot edu CC: dmitri at unm dot edu, maintainers@octave.org Date: Thu, 11 Nov 2004 21:24:51 -0800 Reading the gcc documentation, one finds that the -ffloat-store is an expensive hack to attempt to avoid the excess precision in the floating pointer registers on Intel FPUs. Based on recent experience with DONLP2, a noteworthy nonlinear solver, using -ffloat-store everywhere is an unnecessarily costly workaround. The problem we saw in DONLP2, and I suspect in the numerical subroutines used by Octave, is that they automatically calculate the machine floating pointer characteristics. When floating pointer numbers are left in Intel FPU registers during those calculations, they have extra precision (80 bits, I think), instead of the 64 bits that doubles have in memory. Therefore, the calculated values of vital constants like "machine epsilon" are wrong. The problem is that -ffloat-store can really slow things down. Dmitri observed a 20% slowdown in Octave using -ffloat-store everywhere. I saw a 75% slowdown in DONLP2 compiling with -ffloat-store. Rather than use -ffloat-store everywhere, it should be enough to use it to compile just the numerical libraries (but that might still be a big performance impact). An alternative is to compile just the code which automaticly determines the machine constants with -ffloat-store, or even rewrite those routines. For example, the routine "dmach" (eg, http://www.netlib.org/blas/dmach.f) looks like it uses the same approach that failed for donlp2_f77, and therefore, would need to be compiled with -ffloat-store, or be rewritten. We could avoid --float-store, and instead, set the floating point control word to avoid extended precision, as suggested by g77 info; but that seems suboptimal and nonportable. To fix DONLP2, we explicity coded to avoid extended precision when computing epsmac and tolmac, using a wrapper function ("double_identity") around the intemediate results which, in effect, forced the compiler to discard the extra (extended) precision. Here's a little snippet of that reworked code: external double_identity double precision double_identity EPSMAC = TWO**(-20) 100 CONTINUE EPSMAC=EPSMAC/TWO TERM=double_identity(ONE+EPSMAC) IF ( TERM .NE. ONE ) GOTO 100 EPSMAC=EPSMAC+EPSMAC TOLMAC=EPSMAC 200 CONTINUE TOL1=TOLMAC TOLMAC=double_identity(TOLMAC/TWOP4) IF ( TOLMAC .NE. ZERO ) GOTO 200 TOLMAC=TOL1 Here are the double_identity routines. C Purpose: discard extra (ie, extended) precision to enable (donlp2) C computing epsmac properly without recourse to the -ffloat-store C hack which hurts performance. C We do this by forcing the value into array storage and passing the C array to a helper routine, since we don't want the optimizing C compiler to always be able to pass the value in a register with C extended precision. C To be very cautious (paranoid?), we could put double_identity C into a separate compilation unit to prevent (stronger) compile C time interprocedural optimization from optimizing out C double_identity_helper, and then double_identity. double precision function double_identity(asis_value) double precision asis_value double precision hide_value(1) double precision double_identity_helper external double_identity_helper hide_value(1) = asis_value double_identity = double_identity_helper(hide_value) return end double precision function double_identity_helper(hide_value) double precision hide_value(1) double_identity_helper = hide_value(1) return end C [] Hope this helps. /Jskud >------ Begin Included Message ------ > From: "John W. Eaton" > Date: Thu, 11 Nov 2004 22:47:21 -0500 > To: "Dmitri A. Sergatskov" > Cc: maintainers at octave dot org > Subject: Re: gcc 3.4 and Octave/lapack problems > X-CAE-MailScanner-Information: Please contact security at engr dot wisc dot edu if this message contains a virus or has been corrupted in delivery. > X-CAE-MailScanner: Found to be clean (hedwig) > > On 11-Nov-2004, Dmitri A. Sergatskov wrote: > > | John W. Eaton wrote: > | > On 11-Nov-2004, Dmitri A. Sergatskov wrote: > | > > | > | Also, if -ffloat-store indeed the must for lapack/octave, should we > | > | make it a default? > | > > | > It seems like this might be a reasonable change to make. We'll need a > | > configure check since -ffloat-store probably only makes sense for > | > gcc/g++/g77. > | > | I guess one of the questions weather we shall pass it to g77 only > | (at the moment that looks sufficient), or to all three? > | I noticed that loop performance drops some 20% if I have > | it in CXXFLAGS. I do not see any difference if CFLAGS have > | it or not. > | > | Any insights? > > If we are going to use -ffloat-store for Fortran code because it > produces better results (or at least results that are more likely to > agree with what we would expect from 64-bit IEEE floating point > arithmetic) then it seems to me that we should use it for the C and > C++ code as well. Or maybe you would prefer to have bad results > faster? :-) > > I've made changes to configure so that we check to see if the > compilers accept -ffloat-store, but only on x86 platforms when using > platforms when using the GNU compilers (individual checks are made for > each). > > jwe > >------ End Included Message ------