From help-request at octave dot org Tue Dec 21 12:20:50 2004 Subject: Re: Build 2.1.64 on OS 10.3. error From: Per Persson To: Samir Sharshar Cc: help at octave dot org Date: Tue, 21 Dec 2004 19:22:38 +0100 On Dec 21, 2004, at 18:55, Samir Sharshar wrote: > Hello, > > It's me .... > > With ./configure --enable-dl --enable-shared --disabled-static > > I've got > > ld: misc/machar.o has local relocation entries in non-writable section = =20 > (__TEXT,__text) > /usr/bin/libtool: internal link edit command failed > make[3]: *** [libcruft.dylib] Error 1 > make[2]: *** [libraries] Error 2 > make[1]: *** [libcruft] Error 2 > make: *** [all] Error 2 > > Fortran compiler g77 > FLIBS=3D'-lg2c' > FFLAGS=3D'-O5 -funroll-loops' > CFLAGS=3D'-fast -mdynamic-no-pic' > CXXFLAGS=3D'-fast -mdynamic-no-pic' First of all, let me quote the docs for -fast (final paragraph of -fast =20= section in =20 ): ----- Users of -fast should be aware of the following caveats: =95 Because -fast enables highly aggressive optimizations, = some of =20 which may have an effect on code size or on program behavior, thorough =20= testing is especially important before deploying applications compiled =20= with -fast. =95 For maximum run-time performance you should experiment = with a =20 variety of optimization options; no one set of flags is best for all =20= applications. ----- This, unfortunately, translates to "using -fast may wreak havoc, =20 analyze the code and apply the appropriate flags one by one". Secondly, -mdynamic-no-pic is meant for executables, not libraries =20 which needs to have relocatable code. Check "man gcc". You need to make =20= sure that -fPIC is passed alongside -mdynamic-no-pic if you want to =20 build relocatable code with -mdynamic-no-pic applied globally. Finally, my advice would be to start by dropping -fast -mdynamic-no-pic =20= and just add the -mcpu option specifying _your particular_ cpu. If =20 that turns out well, analyze the code and incrementally add flags that =20= you have reason to believe will improve performance, building between =20= each increment until you have obtained a satisfying speedup. If you are really serious, use Shark (from Apple) to analyze the code =20= for things like pipeline stalls etc. Sorry if I'm sounding negative, but it is my experience that applying =20= something as agressive[1] as the -fast option will not work well on =20 something as complex as octave. As I understand it, -fast was added to =20= give good SPEC marks, and has sure seen little testing with code other =20= than the SPEC code. HTH, Per PS. For interested parties I'm pasting a summary of what -fast implies =20= below: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D -fast changes the overall optimization strategy of GCC 3.3 in order to =20= produce the fastest possible running code for G4 and G5 architectures. =20= Optimizations under -fast are roughly grouped under the following =20 categories. 1. =09 -fast sets the optimization level to -O3, the highest level of =20 optimization supported by GCC 3.3. If any other optimization level =20 (-O0, -O1, -O2 or -Os) is specified, it is ignored by the compiler. 2. =09 Alignment. Assume alignments for loops, functions, branches and =20 structure data fields that provide fastest performance on the PowerPC. =20= -fast sets the following alignment-specific options: -falign-loops-max-skip=3D15 -falign-jumps-max-skip=3D15 -falign-loops=3D16 -falign-jumps=3D16 -falign-functions=3D16 -malign-natural 3. =09 -fast enables the -ffast-math option, which allows certain unsafe math = =20 operations for performance gains. 4. =09 Strict aliasing rules. -fast allows the compiler to assume the =20 strictest aliasing rules applicable to the language being compiled. =20= For C and C++, this activates optimizations based on the type of =20 expressions: an object of one type is assumed never to reside at the =20= same address as an object of a different type, unless the types are =20 almost the same. Furthermore, struct field references are assumed not =20= to alias each other as long as their direct and indirect enclosing =20 structure types are distinct. -fast enables the following aliasing =20 options: -fstrict-aliasing -frelax-aliasing -fgcse-mem-alias Warning: the behavior of correct programs will not be affected by =20 strict aliasing, but programs that make use of nonportable type =20 conversions may behave in unexpected ways. 5. =09 -fast enables various performance-related code transformations. These =20= include loop unrolling, transposing nested loops to improve locality of =20= array element access, conversion of certain initiliazation loops to =20 memset calls, and inline expansion of calls to library functions such =20= as floor. -fast enables the following code transformation options: -funroll-loops -floop-transpose -floop-to-memset -finline-floor (G5 only) Some of these transformations increase code size. 6. =09 G5 specific instruction generation. With -fast (unless -mcpu=3DG4 is =20= specified), GCC 3.3 generates instructions which are specific to G5 =20 and result in performance gain for G5. The following options are =20 assumed for G5 under -fast: -mcpu=3DG5 -mpowerpc64 -mpowerpc-gpopt 7. =09 Scheduling changes. -fast option allows inter-block scheduling, and =20 scheduling specific to the G5 architecture. One such scheduling change =20= is load after a store that partially loads what was stored. The =20 following scheduling-related options are enabled by -fast: -mtune=3DG5 (unless -mtune=3DG4 is specified) -fsched-interblock -fload-after-store --param max-gcse-passes=3D3 -fno-gcse-sm -fgcse-loop-depth 8. =09 -fast enables intermodule inlining when all source files are placed on =20= the same command line. The following options are set by -fast and =20 affect such inlining: -funit-at-a-time -fcallgraph-inlining -fdisable-typechecking-for-spec 9. =09 -fast sets -mdynamic-no-pic by default. This allows for generation of =20= non-relocatable code and is not suitable for shared libraries. This =20 option may be overridden by -fPIC. Users of -fast should be aware of the following caveats: =95 Because -fast enables highly aggressive optimizations, = some of =20 which may have an effect on code size or on program behavior, thorough =20= testing is especially important before deploying applications compiled =20= with -fast. =95 For maximum run-time performance you should experiment = with a =20 variety of optimization options; no one set of flags is best for all =20= applications. =95 In future releases of GCC, -fast may enable a different = set of =20 optimization options. The intention behind this option is that -fast =20= will enable optimizations that result in the fastest code for most =20 applications. ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------