From help-request at octave dot org Fri Jun 10 06:59:16 2005 Subject: Re: File size limits in textread()? From: Stefan van der Walt To: koufalas at senet dot com dot au Cc: help at octave dot org Date: Fri, 10 Jun 2005 13:49:56 +0200 --EVF5PPMfhYS0aIcm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline The attached program should help a bit with speed. It's just a quick hack, hasn't been tested too thoroughly. Can probably be made a lot quicker -- I am doing millions of cell-array resizes in there. One good way to improve its speed would be to estimate the cell array size beforehand, based on the number of lines in the file first. Regards Stefan On Fri, Jun 10, 2005 at 12:53:57PM +1000, koufalas at senet dot com dot au wrote: > G'day all, > > I'm using textread() to read in mixed string and integer data from a space delimited text file. I found it worked okay with one data file, but then hangs when reading another, larger data file. Hitting ctrl-c brings back the octave prompt. > > Any ideas as to why textread() doesn't like the larger data file? This file has about 6000 rows and 10 columns, 4 of which are strings, the others are integers. > > I'm now using a loop with fgetl() and split(), but it's very slow. > > The data file is generated from a database query. I might use 2 queries, one for integer data and one for string data. That way at least I can use load or dlmread() to quickly pull in the integer data, and then deal with the string data using textread(). > > ------- > > BTW, the data file is generated from a database query. I had earlier asked about interfacing octave to databases (postgreSQL) as I had problems with some interface code from Dirk Eddelbeuttel. I believe that was because the postgresql version I'm using, 7.2.1, was built using g++ 2.95, i.e. shared libraries for c++ interfacing had unresolved symbols for this reason--lack of agreed ABI. I'll try again at some stage with 7.4, but I have a 5GB database that will have to be converted before I can do that...takes ages. > > Cheers, > > Paul Koufalas > Adelaide, S. Australia > --EVF5PPMfhYS0aIcm Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="tread.cc" #include #include #include #include #include #include template Cell add_data(Cell &output, std::ifstream* in) { column_type v; (*in) >> v; if (!(*in)) { return output; } if (! in->eof() ) { dim_vector d = output.dims(); Array p(d.length(), 0); p(d.length()-1) = d(d.length()-1); d(d.length()-1)++; output.resize(d); output = output.concat(octave_value(v), p); } return output; } DEFUN_DLD(tread, args, , "Quick hack for text input, see `textread' for more detail.\n\n\ Usage: tread(filename, format)\n\n\ where format can include '%s' for string,\n\ '%d' for double,\n\ '%*' to ignore column.") { octave_value_list retval; if (args.length() != 2) { print_usage("tread"); return retval; } std::string filename = args(0).string_value(); std::string format = args(1).string_value(); if (error_state) { error("Invalid argument specified"); print_usage("tread"); return retval; } std::istringstream format_s(format); std::vector columns; while (format_s && !format_s.eof()) { std::string p; format_s >> p; columns.push_back(p); } std::ifstream data(filename.c_str()); if (!data) { error("tread: couldn't open data file %s", filename.c_str()); return retval; } if (data) { data >> std::skipws; } std::vector output(columns.size()); for (int i = 0; i < columns.size(); i++) { output[i] = Cell(1,0); } std::string s; double d; char buf[1024]; long unsigned int line = 0; try { while (data) { for (int i = 0; i < columns.size(); i++) { if (columns[i] == "%d") { output[i] = add_data(output[i], &data); } else if (columns[i] == "%s") { output[i] = add_data(output[i], &data); } else { data >> s; break; } } std::cout << line++ << "\r" << std::flush; data.getline(buf, 1024); // ensure we've reached EOL } } catch (std::exception e) { error("tread: cannot read from %s", filename.c_str()); return retval; } for (int i = 0; i < columns.size(); i++) { retval.append(octave_value(output[i])); } data.close(); return retval; } --EVF5PPMfhYS0aIcm-- ------------------------------------------------------------- Octave is freely available under the terms of the GNU GPL. Octave's home on the web: http://www.octave.org How to fund new projects: http://www.octave.org/funding.html Subscription information: http://www.octave.org/archive.html -------------------------------------------------------------