From octave-sources-request at bevo dot che dot wisc dot edu Wed Mar 31 21:02:18 1999 Subject: Faster findstr.m From: "O. Scott Sands" To: octave-sources at bevo dot che dot wisc dot edu Date: Wed, 31 Mar 1999 22:02:20 -0500 This is a multi-part message in MIME format. --------------420C2ADEC5302C935BC9366B Content-Type: multipart/alternative; boundary="------------FE3259F7B204AF076A311014" --------------FE3259F7B204AF076A311014 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Attached is a hacked-up version of findstr.m from the .../2.1.13/m/strings directory. It's about 20 or so times faster than the existing findstr.m for the kind of searching that we do where I work(finding strings of length 10^.5 -10^1 in other strings of length 10^4-10^5). Your mileage may vary. Enjoy! -- O. Scott Sands o dot s dot sands at ieee dot org --------------FE3259F7B204AF076A311014 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Attached is a hacked-up version of findstr.m from
the .../2.1.13/m/strings directory.  It's about 20 or
so times faster than the existing findstr.m for the
kind of searching that we do where I work(finding
strings of length 10^.5 -10^1 in other strings of
length 10^4-10^5).  Your mileage may vary.

Enjoy!

-- 

O. Scott Sands
o dot s dot sands at ieee dot org
  --------------FE3259F7B204AF076A311014-- --------------420C2ADEC5302C935BC9366B Content-Type: text/plain; charset=us-ascii; name="findstr.m" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="findstr.m" ## Copyright (C) 1996 Kurt Hornik ## ## This file is part of Octave. ## ## Octave is free software; you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2, or (at your option) ## any later version. ## ## Octave is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with Octave; see the file COPYING. If not, write to the Free ## Software Foundation, 59 Temple Place - Suite 330, Boston, MA ## 02111-1307, USA. ## usage: findstr (s, t [, overlap]) ## ## Returns the vector of all positions in the longer of the two strings ## S and T where an occurence of the shorter of the two starts. ## ## If the optional argument OVERLAP is nonzero, the returned vector ## can include overlapping positions (this is the default). ## ## For example, ## ## findstr ("abababa", "aba") => [1, 3, 5] ## findstr ("abababa", "aba", 0) => [1, 5] ## Author: Kurt Hornik ## Adapted-By: jwe ## Mucked-With_by: oss function v = findstr (s, t, overlap) if (nargin < 2 || nargin > 3) usage ("findstr (s, t [, overlap])"); endif if (nargin == 2) overlap = 1; endif if (isstr (s) && isstr (t)) ## Make S be the longer string. if (length(s) < length(t)), tmp = s; s = t; t = tmp; endif s = toascii (s); t = toascii (t); l_t = length (t); l_s = length (s); smat=hankel(s(1:l_t)',s(l_t:l_s)); %make a Hankel matrix out of the tmat=t'; %"search in" string, make a similar tmat=tmat(:,ones(1,(l_s-l_t+1))); %size matrix for the "search for" v=(tmat==smat); %string then make use of faster if rows(v)>1, %arithmetic functions to do the v=find(all(v)); %searching else v=find(v); end if(!overlap & ~isempty(v)), tailv=v(2:length(v)); tailv=tailv(diff(v)>=l_t); if isempty(tailv), v=v(1); else v=[v(1) tailv]; end end else error ("findstr: expecting first two arguments to be strings"); endif endfunction --------------420C2ADEC5302C935BC9366B--