From help-request at octave dot org Thu Jan 19 11:21:57 2006 Subject: Re: Problem with MPITB on IA64 arch From: Gianvito Quarta To: help at octave dot org Cc: Javier =?iso-8859-1?Q?Fern=E1ndez?= Baldomero Date: Thu, 19 Jan 2006 18:20:29 +0100 Hi Javier, now I can reply to the question: 5.- copy-paste a screen dump with the same Octave command sequence I showed above this is the command sequence: [gquarta at n64 ~]$ octave Set SSI rpi to tcp with the command: putenv('LAM_MPI_SSI_rpi','tcp'), MPI_Init Help on MPI: help mpi octave-2.1.72:1> MPI_COMM_WORLD ans = 2.3058e+18 octave-2.1.72:2> MPI_Init ans = 0 octave-2.1.72:3> a=MPI_COMM_WORLD a = 2.3058e+18 octave-2.1.72:4> whos a *** local user variables: Prot Name Size Bytes Class ==== ==== ==== ===== ===== rwd a 1x1 8 scalar Total is 1 element using 8 bytes octave-2.1.72:5> MPI_Finalize ans = 0 octave-2.1.72:6> Thanks, G. QUARTA At 20.30 18/01/2006, you wrote: >Hi, Gianvito > >Gianvito Quarta wrote: > >>Hi, >>I'm trying to set up a parallel octave environment on an Itanium >>II, IA64, 128 cpu cluster. >>I have some problem during the mpitb re-compilation because >>for IA64 arch, the cast from pointer to int gives problem >>(during the compilation with gcc 3.2.3 the error: >>reinterpret_cast from `_comm*' to `int' loses precision >>occurs). >I'm sorry I was not able to reply to your e-mail sent at 15:29 on >time, and this question reached the help list at 17:22. Most people >here won't probably be interested in MPITB compilation problems. If >you don't mind, I'd rather continue this dialog with personal e-mail >instead of the help mailing list. > >Thanks for using MPITB. I'm gladly surprised you managed to reach >that far. I have never used any IA64, but perhaps with a little bit >of help you can manage to build a working MPITB version for that platform. > >Please search for "size" and "alignment" in your LAM config.log >file. I'm mostly interested in the "int" and "void*" types size and >alignment on your IA64 architecture. Also check the endianness. In >my IA32 PC I have this: >________________ >configure:5363: checking size of int >... >configure:5408: result: 4 >configure:5436: checking size of long >... >configure:5481: result: 4 >configure:5509: checking size of long long >... >configure:5554: result: 8 >... >configure:5801: checking size of void * >... >configure:5846: result: 4 >... >configure:6111: checking alignment of int >... >configure:6172: result: 4 >... >configure:6265: checking alignment of long long >... >configure:6326: result: 4 >... >configure:6573: checking alignment of void * >... >configure:6634: result: 4 >... >configure:19090: checking whether byte ordering is bigendian >... >configure:19301: result: no >________________ > >So on IA32 all alignments are 4 and only "long long" has size 8. >That's why I chose to cast LAM communicators (_comm*) to C ints. >BTW, when returned to Octave they become "flints", so MPITB >communicators are Octave scalars (doubles). You are not expected to >do any maths with them, so when later reused they can be casted >again from flints back to C ints and void*. > >Your error message makes me suspect that IA64 void* is size 8, or at >least greater than 4. In order to be able to cast LAM pointers to >Octave , I would need to know >which is the compatible integer type under IA64. BTW, you can also >look for the same information on Octave's own config.log file. I have: >________________ >ac_cv_sizeof_int=4 >ac_cv_sizeof_long=4 >ac_cv_sizeof_long_long=8 >ac_cv_sizeof_short=2 >________________ > >Tell me the alignment and size of your GCC integer types and void* >type so we can choose which one matches best. You can find that >information in the LAM config.log file. E-mail directly to me, we >can later summarize here in the list if you succeed in having MPITB >working under IA64.. > >>I tried to change the casting of pointers to long >>and then I have successifull compiled MPITB. >Perhaps sizeof(long)==8 in IA64 ?!? >I assume you have edited just MPI_COMM_WORLD.cc, on line 33 >from > RET_1_ARG(reinterpret_cast( NAME )) // defined > -> expanded >to > RET_1_ARG(reinterpret_cast( NAME )) // > defined -> expanded > >If you haven't modified that line, or have modified others, please >let me know. There is no hint in your original e-mail about which >files/lines you have edited. > >>Unfortunaly some problems occur at run time, >>... >>[info rank]=MPI_Comm_rank(MPI_COMM_WORLD)% rank=0 >>MPI process rank 0 (n0, p31218) caught a SIGSEGV in MPI_Comm_rank. >>Rank (0, MPI_COMM_WORLD): Call stack within LAM: >>Rank (0, MPI_COMM_WORLD): - MPI_Comm_rank() >>Rank (0, MPI_COMM_WORLD): - main() >I think the SigSegV may come from the communicator argument, since >that's what you have edited (if I correctly guessed above). > >So MPI_Init is working ?!? Great!!! It also seems you can also >invoke MPI_COMM_WORLD without any problems. Try it out. I get this: >________________ >$ octave >Set SSI rpi to tcp with the command: > putenv('LAM_MPI_SSI_rpi','tcp'), MPI_Init >Help on MPI: help mpi >octave:1> MPI_COMM_WORLD >ans = 1099670176 >octave:2> MPI_Init >ans = 0 >octave:3> a=MPI_COMM_WORLD >a = 1099670176 >octave:4> whos a > >*** local user variables: > > Prot Name Size Bytes Class > ==== ==== ==== ===== ===== > rwd a 1x1 8 scalar > >Total is 1 element using 8 bytes > >octave:5> MPI_Finalize >ans = 0 >octave:6> quit >[javier at oxigeno mpitb]$ >________________ > >So the pointer becomes a flint a=1099670176. Send me a copy of your >output for this command sequence. Of course, if a=0 that's where the >SigSegV comes from. Perhaps the pointer is being correctly casted to >long (if you were lucky with your long decision), but it is not >being correctly casted back to pointer, since it's using this code: >________________ > MPI_Comm comm = (MPI_Comm) args(ARGN).int_value(); >________________ >That's my fault. Right now I cannot remember why I didn't write any >XXX_cast reserved word there. When I learned one shouldn't directly >cast in C++, I started to static_ and reinterpret_cast. Perhaps I >wrote that line before I learned that. I have forgotten again C++, >so I guess I must re-read once more Stroustrup's "The C++ progr. >lang" chapter 6.2.7... sigh! > >Ok, summarizing: this is your homework :-) >0.- reply directly to me, not to the mailing list >1.- copy-paste LAM config.log lines related to int and void* sizes, >alignments and endianness >2.- copy-paste Octave config.log lines related to int sizes >3.- tell me if you modified the line I mentioned (MPI_COMM_WORLD.cc, >on line 33) >4.- tell me if you modified (and how) any other line >5.- copy-paste a screen dump with the same Octave command sequence I >showed above >6.- (just a joke) locate in the sources the last line of code shown, >the one with the bad C-style typecast > >When I have all that information I'll suggest you to change the >typecast to reinterpret<> (gcc will complaint, as it should if I had >wrote it correctly for a start), if so then I'll suggest you to cast >from long instead of from int... and so on until it works (I hope :-) > >-javier > > > > >------------------------------------------------------------- >Octave is freely available under the terms of the GNU GPL. > >Octave's home on the web: http://www.octave.org >How to fund new projects: http://www.octave.org/funding.html >Subscription information: http://www.octave.org/archive.html >------------------------------------------------------------- ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------