From octave-maintainers-request at bevo dot che dot wisc dot edu Mon Mar 29 12:54:02 2004 Subject: malloc/erase From: Paul Thomas To: octave-maintainers-list Date: Mon, 29 Mar 2004 20:53:44 +0200 This is a multi-part message in MIME format. --------------000306010908030705050903 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit The following is an exchange of messages between Paul Kienzle and Paul Thomas concerning comparisons between new, malloc and stl vector for allocating memory. On Mon, Mar 29, 2004 at 01:47:00PM +0200, THOMAS Paul Richard wrote: >> Paul, >> >> Two things that seem surprising to me: >> >> 1) On a 2.5GHz pentium with Windows 2000: >> >> cygwin32 11.17s (using clock() ) >> Ccygwin32 10.54s >> >> which as factor of two or so slower than the Athlon1700 with XP. I wonder >> if this indicates a dependence on Windows version, as well? > > That wouldn't surprise me. >> 2) Replacing new+delete or malloc+erase with: >> >> vector myvec(1); >> double *myarray = &myvec[0]; >> if ( myarray==NULL) { printf("alloc failed\n");exit(1);} >> else {myarray=NULL; myvec.clear()} > > vector throws an exception if not enough memory, so myarray will never be NULL, except maybe if the array is of length 0. There is no reason to call clear since the destructor will clear the data when it is done. This is the code I'm using for timing: #include using namespace std; int main() { for (int iloop = 0; iloop < 10000000; iloop++) { vector myvec(1); double *myarray = &myvec[0]; } return 0; } >> >> is ten times faster than either (g++ -O2) >> >> Vcygwin32 1.094s >> >> Taking at face value the standard library guarantee to automatically >> destruct the resources of containers, when going out of scope, and >> eliminating the clear(), drops this time to 0.469s. I wonder how this is >> possible, when vector is presumably doing the same as new or malloc? > > I'm a little worried a clever optimizer will eliminate most of this loop. We really ought to be assigning to the vector and creating a total. This will also show us how significant the cost of alloc is compared to e.g., a trig function. However, I already have lots of data using the old method, so I'll stick with it for now. I ran this test on a few other boxen. vector is nearer in speed to malloc than to new on all of them: vec new alloc Linux 3.39 4.92 3.17 IRIX 3.03 4.15 2.21 IRIXgcc 2.94 4.28 2.23 Mac10.3 9.13 11.22 9.80 ming32 3.02 14.45 12.65 cyg32 3.90 16.17 14.35 ming33 18.60 12.27 cyg33 72.04 24.34 Notice that the results for vector on Linux are right in line with the values on slightly slower windows machine, so I'm inclined to accept them as reasonable. Reading through /usr/include/c++/3.2/bits/stl_alloc.h, they have this to say: __malloc_alloc_template A malloc-based allocator. Typically slower than the __default_alloc_template. Typically thread safe and more storage efficient. __default_alloc_template: Default node allocator. Uses __mem_interface for its underlying requests (and makes as few requests as possible). >> >> Should we be using stl vectors in ArrayRep and so on, or at least copying >> the content of stl_vector.h and modifying it for octave? > > This is a question for John and David (who is working on memory alignment for FFTW). Given the speedup on Windows I'm all for it, especially since it is a win over new[] everywhere I tested. Make sure though that we don't take a big hit in memory efficiency. Paul Kienzle pkienzle at users dot sf dot net --------------000306010908030705050903 Content-Type: message/rfc822; name="Re: malloc/erase" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Re: malloc/erase" Return-Path: Received: from mwinf0801.wanadoo.fr (mwinf0801.wanadoo.fr) by mwinb0606 (SMTP Server) with LMTP; Mon, 29 Mar 2004 17:41:13 +0200 X-Sieve: Server Sieve 2.2 Received: from sainfoin.extra.cea.fr (sainfoin.extra.cea.fr [132.166.172.103]) by mwinf0801.wanadoo.fr (SMTP Server) with ESMTP id 049851800168 for ; Mon, 29 Mar 2004 17:41:12 +0200 (CEST) Received: from araneus.saclay.cea.fr (araneus.saclay.cea.fr [132.166.192.110]) by sainfoin.extra.cea.fr (8.12.10/8.12.9/CEAnet-Internet.2.0) with ESMTP id i2TFfBop010960 for ; Mon, 29 Mar 2004 17:41:11 +0200 (MEST) Received: from nenuphar.saclay.cea.fr (unverified) by araneus.saclay.cea.fr (Content Technologies SMTPRS 4.3.12) with ESMTP id ; Mon, 29 Mar 2004 17:41:17 +0200 Received: from MIRABEAU.saclay.cea.fr (mirabeau.intra.cea.fr [132.166.189.103]) by nenuphar.saclay.cea.fr (8.12.10/8.12.9/CEAnet-Internet.2.0) with ESMTP id i2TFfAXl020518; Mon, 29 Mar 2004 17:41:10 +0200 (MEST) Received: by mirabeau.intra.cea.fr with Internet Mail Service (5.5.2657.72) id ; Mon, 29 Mar 2004 17:43:15 +0200 Message-ID: <756DFD3DE8F1D411A59A00306E06E847014F8DF9 at drfccad dot cad dot cea dot fr> From: THOMAS Paul Richard To: "'Paul Kienzle'" Cc: jwe at bevo dot che dot wisc dot edu, David.Bateman@motorola.com, "'paulthomas2 at wanadoo dot fr'" Subject: RE: malloc/erase Date: Mon, 29 Mar 2004 17:41:09 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2657.72) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Paul, The reason is quite simple; I was formally bashed around the head for subscribing to the lists from a CEA computer. On top of that, access = to my personal ISP has been blocked, as of last week. All this will change totally if ITER comes here and fusion is displaced to the other side of = the fence. For now, though, we just have to put up with the security = measures. The plan was to post it to the list, from home, this evening. I was = hoping as well that you would have put me right if the part about vectors was = off kilter - I still feel as if I am seriously out of my depth a lot of the time. To paraphrase Abe Lincoln, "It is better to be thought a fool = than to post on the list and remove all doubt." I just tried the comparison on a Compaq Tru64 system; there the stl = method is slower than both new and malloc. I cannot quantify because clock() = seems to be broken in gcc. Regards Paul =20 -----Message d'origine----- De : Paul Kienzle [mailto:pkienzle at jazz dot ncnr dot nist dot gov] Envoy=E9 : lundi 29 mars 2004 17:27 =C0 : THOMAS Paul Richard Cc : jwe at bevo dot che dot wisc dot edu; David.Bateman@motorola.com Objet : Re: malloc/erase Paul,=20 I'm CC'ing to John and David --- I'm not sure why you are not=20 posting this to the list, other than better response time from me of course (I only read the list in the evening 8-) On Mon, Mar 29, 2004 at 01:47:00PM +0200, THOMAS Paul Richard wrote: > Paul, >=20 > Two things that seem surprising to me: >=20 > 1) On a 2.5GHz pentium with Windows 2000: >=20 > cygwin32 11.17s (using clock() ) > Ccygwin32 10.54s >=20 > which as factor of two or so slower than the Athlon1700 with XP. I = wonder > if this indicates a dependence on Windows version, as well? That wouldn't surprise me. > 2) Replacing new+delete or malloc+erase with: >=20 > vector myvec(1); > double *myarray =3D &myvec[0]; > if ( myarray=3D=3DNULL) { printf("alloc failed\n");exit(1);} > else {myarray=3DNULL; myvec.clear()} vector throws an exception if not enough memory, so myarray will never be NULL, except maybe if the array is of length 0. There is no reason to call clear since the destructor will clear the data when it is done. This is the code I'm using for timing: #include using namespace std; int main() { for (int iloop =3D 0; iloop < 10000000; iloop++) { vector myvec(1); double *myarray =3D &myvec[0]; } return 0; } >=20 > is ten times faster than either (g++ -O2) >=20 > Vcygwin32 1.094s >=20 > Taking at face value the standard library guarantee to automatically > destruct the resources of containers, when going out of scope, and > eliminating the clear(), drops this time to 0.469s. I wonder how = this is > possible, when vector is presumably doing the same as new or malloc? I'm a little worried a clever optimizer will eliminate most of this loop. We really ought to be assigning to the vector and creating a total. This will also show us how significant the cost of alloc is compared to=20 e.g., a trig function. However, I already have lots of data using the old method, so I'll stick with it for now. I ran this test on a few other boxen. vector is nearer in speed to malloc than to new on all of them: vec new alloc Linux 3.39 4.92 3.17 IRIX 3.03 4.15 2.21 IRIXgcc 2.94 4.28 2.23 Mac10.3 9.13 11.22 9.80 ming32 3.02 14.45 12.65 cyg32 3.90 16.17 14.35 ming33 18.60 12.27 cyg33 72.04 24.34 Notice that the results for vector on Linux are right in line=20 with the values on slightly slower windows machine, so I'm inclined to accept them as reasonable. Reading through /usr/include/c++/3.2/bits/stl_alloc.h, they have this to say: __malloc_alloc_template A malloc-based allocator. Typically slower than the __default_alloc_template. Typically thread safe and more storage efficient. __default_alloc_template: Default node allocator. Uses __mem_interface for its underlying requests (and makes as few requests as possible).=09 >=20 > Should we be using stl vectors in ArrayRep and so on, or at least = copying > the content of stl_vector.h and modifying it for octave? This is a question for John and David (who is working on memory alignment for FFTW). Given the speedup on Windows I'm all for it, especially=20 since it is a win over new[] everywhere I tested. Make=20 sure though that we don't take a big hit in memory efficiency. Paul Kienzle pkienzle at users dot sf dot net --------------000306010908030705050903--