From octave-sources-request at bevo dot che dot wisc dot edu Wed Feb 4 09:48:41 2004 Subject: Re: code to import csv files From: "Pascal A. Dupuis" To: octave-sources at bevo dot che dot wisc dot edu Date: Wed, 4 Feb 2004 07:54:06 -0600 --IJpNTDwzlM2Ie8A6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline About the csv-to-octave filter, I enclose a revised version, where instead of constructing a matrix and transposing it, the matrix is constructed in transposed form and used directly. The logic about quotes is also simpler: 1) if there is no quotes, emit the result as a scalar 2) otherwise, emit it as a string, with opening and closing quotes removed. This implies that each string must be quoted to be recognised as such. A more CPU-intensive solution is to try to interpret each unquoted value as a number, but then the work is done twice, once in the Perl script, once by octave itself. Best regards Pascal Dupuis -- Dr. ir. Pascal Dupuis K. U. Leuven, ESAT/ELECTA (formerly ELEN): http://www.esat.kuleuven.ac.be/ Kasteelpark Arenberg, 10; B-3001 Leuven-Heverlee, Belgium Tel. +32-16-32 10 21 -- Fax +32-16-32 19 85 --IJpNTDwzlM2Ie8A6 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=csv2oct #!/usr/bin/perl =head1 NAME csv2oct - script to read a csv file and generates something Octave can read as cells =head1 SYNOPSIS csv2oct [-q quotestring] [-d delim] tmp_file data_file =head1 DESCRIPTION Reading csv files is rather, hmm, difficult, so all the gory details are left to Perl parse_line routine. Algorithm is as follows: =item 1 split each line according to the delimiter, keeping quotes; =item 2 for each element, remove leading and trailing space. If the result is empty, translate it as 0.0. =item 3 compare the element with the version with quotes removed. If they are equal, translate it to a scalar value, otherwise emit the unquoted string. Everything must be processed in memory, as Octave requires to know in advance the cell dimension. Once done, generates an input file for Octave, or dump it to STDERR if the tmp_file can't be opened. =head1 COPYRIGHT Copyright (C) 2004 Pascal Dupuis This file is part of Octave. Octave is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. Octave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Octave; see the file COPYING. If not, write to the Free Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. =cut use strict; use Text::ParseWords; use POSIX qw(strftime); use Sys::Hostname; use Getopt::Std; my %arg; getopts("q:s:v", \%arg); $arg{q} = "'" unless defined $arg{q}; $arg{s} = "," unless defined $arg{s}; my $outfile = shift; $_ = shift || die("Usage: csv2oct [-q quotestring] [-d delim] tmp_file data_file"); if (m/.bz2$/) { open DATAFILE, "-|", "bzip2", "-dc", $_ or die "Can't open $_ using bzip2 pipe: $!"; } else { if (m/.gz$/) { open DATAFILE, "-|", "gzip", "-dc", $_ or die "Can't open $_ using gzip pipe: $!"; } else { open DATAFILE, $_ or die "Can't open $_: $!"; } } my ($numlines, $parsedlines, at AoA, @fields, $transl, $idx); while () { ++$numlines; chomp; #skip comment next if m/^\s?#/; # it seems that the combination of "'" and "," is tricky to proceed. # split things, but keep quotes at fields = parse_line($arg{s}, 1, $_); $idx = 0; foreach ( at fields) { # remove leading and trailing spaces s/^\s+//; s/\s+$//; # replace empty elems by 0 if (length() < 1) { $transl = "# name: \n# type: scalar\n0\n"; } else { # is it quoted? if (m/$arg{q}/) { my $unquot = join('', quotewords($arg{q}, 0, $_)); # yes, print the unquoted version $transl = "# name: \n# type: string array\n" . "# elements: 1\n# length: " . length($unquot) . "\n$unquot\n"; } else { # no, it must be numeric $transl = "# name: \n# type: scalar\n$_\n"; } } # Octave expects its values colum-wise push at {$AoA[$idx++]}, $transl; } ++$parsedlines; } close DATAFILE; unless(open OUT, ">$outfile") { warn "Can't open tmp file $outfile: $!\nRedirecting to STDERR\n"; open OUT, ">&", \*STDERR or die "Can't dup STDERR: $!"; } #generate the header my $now_string = "# Created by csv2oct," . " at " . localtime() . strftime(" %Z ",0,0,0,0,0,0) . $ENV{'USER'} . " at " . hostname() . "\n"; print OUT $now_string; print OUT "# name: x\n# type: cell\n"; print OUT "# rows: at {[1+$#{$AoA[0]}]}\n"; print OUT "# columns: at {[1+$#AoA]}\n"; #dump the values foreach ( at AoA) { print OUT at $_; } close OUT; print "csv2oct: $numlines lines processed, $parsedlines successfully parsed\n" if $arg{v}; --IJpNTDwzlM2Ie8A6--