From help-octave-request at bevo dot che dot wisc dot edu Sat Jan 25 15:27:09 2003 Subject: Re: Why is sscanf() so slow? From: Dirk Eddelbuettel To: stefan Cc: help-octave at bevo dot che dot wisc dot edu Date: Sat, 25 Jan 2003 15:24:14 -0600 On Sat, Jan 25, 2003 at 02:45:39PM -0600, stefan wrote: > Dear Octaver's, > > first of all: Thanks for great software. I am using it now for about two > years, especialy for displaying and adjusting measured data (comes from > some measurement bus system to computer). The software which drives these > devices almostly always produce tab- or comma-seperated data. > > For this I do: [...] > or very likely... > > For more than 1000 lines this takes *ages*. Is there a better way to do > so or is it a slow implementation in octave? I would like to see this > improved some day. Please see below for the function aload.m from the octave-ci collection by Kurt Horik et al; I used to use this a lot. It essentially pre-processes the data first, and then uses a normal load (in ascii mode). I never quite figure out why JWE didn't include it into Octave itself when he chose to include other octave-ci functions. Anyway, there is a Debian package of octave-ci, a tarball in Vienna, Austria. Paul Kienzel also has something similar in octave-forge. Hope this help, Dirk ## Copyright (C) 1996, 1997 Kurt Hornik ## ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2, or (at your option) ## any later version. ## ## This program is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this file. If not, write to the Free Software Foundation, ## 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. ## x = aload (filename [, cw [, rw [, FS [, NA [, ignore_regexp]]]]]) ## loads the flat ASCII data file `filename' into x. ## ## With the optional parameters cw and rw one can select the data ## columns (variables) and rows (observations) to load. Both cw and rw ## may be index vectors or Inf (default), meaning to load everything. ## ## With FS, one can specify the field separator in the data file as one ## would do in AWK. Default is " ". ## ## With NA, one can specify how unavailable data are represented in the ## data file, and how they should be loaded into Octave. The default is ## "NA/NaN", meaning that NA's should be converted to NaN's. (Note that ## this does not work yet.) ## ## Finally, ignore_regexp is an egrep regular expression specifying ## which lines in the data file should be ignored. The default is ## "^[\t]*(#|%|$)", meaning that empty lines and lines where # or % are ## the first non-whitespace characters are ignored. ## ## Note that rw selects the data line (observation) numbers and NOT the ## line numbers in the file! ## ## Note also that currently, only real numbers can be loaded. ## Author: KH ## Description: Load from a flat ASCII data file function x = aload (filename, cw, rw, FS, NA, ignore_regexp) if ((nargin < 1) || (nargin > 6)) usage ("aload (filename, cw, rw, FS, NA, ignore_regexp)"); endif if (nargin < 6) ignore_regexp = "^[ \t]*(#|%|$)"; endif if (nargin < 5) NA = "NA/NaN"; endif if (nargin < 4) FS = " "; endif if (nargin < 3) rw = Inf; endif if (nargin < 2) cw = Inf; endif ## maybe_do_more_sanity_checks (); if !is_struct (stat (filename)) error (sprintf ("aload: File '%s' not found", filename)); endif tmpfile = octave_tmp_file_name (); system (["cat ", filename, " | ", ... "egrep -ve \'", ignore_regexp, "\' | ", ... "sed -e 's/", NA, "/g' > ", tmpfile]); eval (system (["cat ", tmpfile, " | ", ... "awk 'BEGIN { FS = \"", FS, "\" }; ", ... "END { printf \"rf = %g; cf = %g;\", NR, NF }'"])); if (cw == Inf) cw = 1 : cf; elseif (min (size (cw)) == 1) cw = cw (find (cw <= cf)); else error ("aload: cw must be a scalar or a vector"); endif if (rw == Inf) rw = 1 : rf; elseif (min (size (rw)) == 1) rw = rw (find (rw <= rf)); else error ("aload: rw must be a scalar or a vector"); endif loadfile = octave_tmp_file_name (); fd = fopen (loadfile, "w"); fprintf (fd, "# name x\n# type: matrix\n"); fprintf (fd, "# rows: %g\n# columns: %g\n", length (rw), length (cw)); fclose (fd); s = sprintf ("$%d", cw(1)); for i = 2 : length (cw); s = sprintf ("%s, $%d", s, cw(i)); endfor system (["cat ", tmpfile, " | ", ... "awk 'BEGIN { FS = \"", FS, "\" }; { print ", s, " };' ", ... " >> ", loadfile]); eval (["load -force -ascii ", loadfile]); x = x(rw, :); system (sprintf ("rm -f %s %s", tmpfile, loadfile)); endfunction > For now I do 'save -mat-binary %s values' to keep loading times short next > time. Some cache algo around the above code. > > Any help is appreciated, > stefan at lkcc dot org > > > > ------------------------------------------------------------- > Octave is freely available under the terms of the GNU GPL. > > Octave's home on the web: http://www.octave.org > How to fund new projects: http://www.octave.org/funding.html > Subscription information: http://www.octave.org/archive.html > ------------------------------------------------------------- > -- Prediction is very difficult, especially about the future. -- Niels Bohr ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------