From octave-sources-request at bevo dot che dot wisc dot edu Thu Jan 29 12:59:18 2004 Subject: code to import csv files From: "Pascal A. Dupuis" To: octave-sources at bevo dot che dot wisc dot edu Date: Thu, 29 Jan 2004 09:54:00 -0600 --G4iJoqBmSsgzjUCe Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Please find herewith enclosed two files to import csv files. All the complexity of working with quoted strings in handled in a Perl sub. Best regards Pascal Dupuis -- Dr. ir. Pascal Dupuis K. U. Leuven, ESAT/ELECTA (formerly ELEN): http://www.esat.kuleuven.ac.be/ Kasteelpark Arenberg, 10; B-3001 Leuven-Heverlee, Belgium Tel. +32-16-32 10 21 -- Fax +32-16-32 19 85 --G4iJoqBmSsgzjUCe Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=csv2oct #!/usr/bin/perl =head1 NAME csv2oct - script to read a csv file and generates something Octave can read as cells =head1 SYNOPSIS csv2oct [-q quotestring] [-d delim] tmp_file data_file =head1 DESCRIPTION Reading csv files is rather, hmm, difficult, so all the gory details are left to Perl parse_line routine. Algorithm is as follows: =item 1 split each line according to the delimiter, keeping quotes; =item 2 for each element, remove leading and trailing space. If the result is empty, translate it as 0.0. =item 3 compare the element with the version with quotes removed. If they are equal, translate it to a scalar value, otherwise emit the unquoted string. Everything must be processed in memory, as Octave requires to know in advance the cell dimension. Once done, generates an input file for Octave, or dump it to STDERR if the tmp_file can't be opened. =head1 COPYRIGHT Copyright (C) 2004 Pascal Dupuis This file is part of Octave. Octave is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. Octave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Octave; see the file COPYING. If not, write to the Free Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. =cut use strict; use Text::ParseWords; use POSIX qw(strftime); use Sys::Hostname; use Getopt::Std; sub transpose { # code found at http://www.raycosoft.com/rayco/support/perl_tutor.html my at mat = @_; my $x; return map { $x = $_; [ map { $mat[$_][$x] } 0 .. $#mat ]; } 0 .. $#{$mat[0]}; } my %arg; getopts("q:s:v", \%arg); $arg{q} = "'" unless defined $arg{q}; $arg{s} = "," unless defined $arg{s}; my $outfile = shift; $_ = shift || die("Usage: csv2oct [-q quotestring] [-d delim] tmp_file data_file"); if (m/.bz2$/) { open DATAFILE, "-|", "bzip2", "-dc", $_ or die "Can't open $_ using bzip2 pipe: $!"; } else { if (m/.gz$/) { open DATAFILE, "-|", "gzip", "-dc", $_ or die "Can't open $_ using gzip pipe: $!"; } else { open DATAFILE, $_ or die "Can't open $_: $!"; } } my ($numlines, $parsedlines, at AoA, @fields, @transl); while () { ++$numlines; chomp; #skip comment next if m/^\s?#/; at transl = (); # it seems that the combination of ' and , is tricky to proceed. # split things, but keep quotes at fields = parse_line($arg{s}, 1, $_); foreach ( at fields) { # remove leading and trailing spaces s/^\s+//; s/\s+$//; # replace empty elems by 0 if (length() < 1) { push at transl, "# name: \n# type: scalar\n0\n"; next; } # is it quoted ? Compare original and unquoted versions my $unquot = join('', quotewords($arg{q}, 0, $_)); if ($unquot eq $_) { # no, it must be numeric push at transl, "# name: \n# type: scalar\n$_\n"; } else { # yes, print the unquoted version push at transl, "# name: \n# type: string array\n" . "# elements: 1\n# length: " . length($unquot) . "\n$unquot\n"; } } ++$parsedlines; push at AoA, [ @transl ]; } close DATAFILE; unless(open OUT, ">$outfile") { warn "Can't open tmp file $outfile: $!\nRedirecting to STDERR\n"; open OUT, ">&", \*STDERR or die "Can't dup STDERR: $!"; } #generate the header my $now_string = "# Created by csv2oct," . " at " . localtime() . strftime(" %Z ",0,0,0,0,0,0) . $ENV{'USER'} . " at " . hostname() . "\n"; print OUT $now_string; print OUT "# name: x\n# type: cell\n"; print OUT "# rows: at {[1+$#AoA]}\n"; print OUT "# columns: at {[1+$#{$AoA[0]}]}\n"; #Octave expects its values row by row at AoA = transpose(@AoA); #dump the values foreach ( at AoA) { print OUT at $_; } close OUT; print "csv2oct: $numlines lines processed, $parsedlines successfully parsed\n" if $arg{v}; --G4iJoqBmSsgzjUCe Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="read_csv.m" function x = read_csv(name, sep, quote) %# function x = read_csv(name, sep, quote) %# This function permits to import a csv file as a set of cells. The %# handling of the csv details are left to an external Perl script. %# default values are "," for the separator and "'" for the quote. %# Compressed files are automagically expanded. %% Copyright (C) 2004 Pascal Dupuis %% %% This file is part of Octave. %% %% Octave is free software; you can redistribute it and/or %% modify it under the terms of the GNU General Public %% License as published by the Free Software Foundation; %% either version 2, or (at your option) any later version. %% %% Octave is distributed in the hope that it will be useful, %% but WITHOUT ANY WARRANTY; without even the implied %% warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR %% PURPOSE. See the GNU General Public License for more %% details. %% %% You should have received a copy of the GNU General Public %% License along with Octave; see the file COPYING. If not, %% write to the Free Software Foundation, 59 Temple Place - %% Suite 330, Boston, MA 02111-1307, USA. global DEBUG if !exist('DEBUG'), DEBUG = 0; end tmpname = tmpnam(); cmd = 'perl csv2oct'; if (nargin > 1) cmd = [ cmd '-s ' sep ]; endif if (nargin > 2) cmd = [ cmd '-q ' quote ]; endif cmd = [ cmd ' ' tmpname ' ' name]; [output, status] = system(cmd); cmd = [ 'load ' tmpname ';' ]; eval(cmd); [err, msg] = unlink(tmpname); --G4iJoqBmSsgzjUCe--