-
I have a string list (txt) of about 36.000 words and load it into PStrList, then using Pos function, I tried searching several words by iterate from first index until last index and then save all results ( Pos(x,y) > 0 ).
I tried compare using TStringList (VCL), and the speed when using TStringList is twice faster than using PStrList. Using PFastStrListEx is almost comparable to PStrList. Anyone know why the speed is slow when using PStrList like this case ? Or is there any other solution to speed up searching List of string using Pos function in KOL ?
Thank you
-
I found that the slower it seem the property Items[Idx: integer] (to access the item in strings, comparing the property Strings[index:integer] in TStringList). Can I speed up to access item in PStrList ( I've also tried using ItemPtrs, but almost equal with items ) ?
-
Which version of KOL are you using? Did you sort the data in the stringlist? Have you tried the conditional define: "SPEED_FASTER"? Please provide compilable code for us to test.
-
I'm using the last version of KOL ( kol-3.14159265389 ) and I didn't sort the data. I've tries adding SPEED_FASTER conditional define but the result almost the same. Sample code in KOL
procedure TForm1.Button1Click(Sender: PObj);
var
sl : PStrList;
i,j,x : Integer;
start : Dword;
s : string;
begin
sl := NewStrList;
sl.LoadFromFile('1juta.txt');
start := GetTickCount;
x := 0;
for j:= 1 to 10 do
for i:=0 to sl.Count-1 do
begin
s := sl.Items[i];
Inc(x);
end;
EditBox1.Text := 'Loop '+ Int2Ths(x) +' in '+ Int2Ths(GetTickCount-start)+'ms';
sl.Free;
end;
Sample code in VCL:
procedure TForm1.Button1Click(Sender: TObject);
var
sl : TStringList;
i,j,x : Integer;
start : Dword;
s : string;
begin
sl := TStringList.Create;
sl.LoadFromFile('1juta.txt');
start := GetTickCount;
x := 0;
for j:= 1 to 10 do
for i:=0 to sl.Count-1 do
begin
s := sl[i]; Inc(x);
end;
Edit1.Text := Format('Loop %d in %dms',[x,(GetTickCount-start)]);
sl.Free;
end;
Using PStrList : Loop 10,000,000 in 1,310ms Using TstringList : Loop 10000000 in 296ms Sample source code : http://wikisend.com/download/529728/KOL Speed test.7z
-
-
Don't use Pos(), don't extract strings via Items[] if you really want speed. Use ItemPtrs[] property and search substrings via PChar functions like StrScan() + StrIsStartingFrom() (or write your own). E.g.: function StrContainsPattern(s, pattern: PKOLChar): boolean;
begin
Result := false;
while true do
begin
s := StrScan(s, pattern[0]);
if s == nil then Exit;
if StrIsStartingFrom(s, pattern) then
break;
end;
Result := true;
end;
Strings in PStrList are not stored as String, so attempting to extract Items[] to strings leads to allocating memory and moving bytes to a new location. Using ItemPtrs it is possible to avoid this.
-
Thank you Vladimir, Using ItemPtrs[] with PAnsiChar, Speed is faster than in VCL (very fast), and with StrScan() I also get comparable speed
-
I drawn code yesterday with a couple of mistakes, sorry, you should correct it before using, or write your own code from scratch.
-
yes, it seem that your code need correction.. still trying to fix it
-
Trying using strPos (little modification from Free Pascal rtl, C:\FPC\fpc-2.4.4\rtl\inc\stringsi.inc), but it seem not optimized, slower than pos function
function strpos(str1,str2 : PAnsiChar) : PAnsiChar;
var
P : PAnsiChar;
sz : Integer;
begin
strpos:=nil;
if (str1 = nil) or (str2 = nil) then Exit;
P := StrScan(str1,str2^);
if P = nil then Exit;
sz := StrLen(str2);
while P <> nil do
begin
if strlcomp(P,str2,sz)=0 then
begin
strpos:=P;
exit;
end;
inc(P);
P := StrScan(p,str2^);
end;
end; Vladimir, can you add fast version of StrPos(str1,str2:PAnsiChar) into KOL ?
-
I've modified the pos function (pascal version) code from FastCode, because I'm not familiar with asm.
function Pos2(const SubStr : PAnsiChar; const Str : PAnsiChar) : Integer;
var
len, lenSub: integer;
ch: char;
p, pSub, pStart, pStop: PAnsiChar;
label
Ret, Ret0, Ret1, Next0, Next1;
begin;
p:=Str;
pSub:=SubStr;
if (p=nil) or (pSub=nil) then begin;
Result:=0;
exit;
end;
len:= StrLen(p);
lenSub:=StrLen(pSub);
if (len<lenSub) or (lenSub<=0) then begin;
Result:=0;
exit;
end;
lenSub:=lenSub-1;
pStop:=p+len;
p:=p+lenSub;
pSub:=pSub+lenSub;
pStart:=p;
ch:=pSub[0];
if lenSub=0 then begin;
repeat;
if ch=p[0] then goto Ret0;
if ch=p[1] then goto Ret1;
p:=p+2;
until p>=pStop;
Result:=0;
exit;
end;
lenSub:=-lenSub;
repeat;
if ch=p[0] then begin;
len:=lenSub;
repeat;
if pword(psub+len)^<>pword(p+len)^ then goto Next0;
len:=len+2;
until len>=0;
goto Ret0;
Next0:end;
if ch=p[1] then begin;
len:=lenSub;
repeat;
if pword(@psub[len])^<>pword(@p[len+1])^ then goto Next1;
len:=len+2;
until len>=0;
goto Ret1;
Next1:end;
p:=p+2;
until p>=pStop;
Result:=0;
exit;
Ret1:
p:=p+2;
if p<=pStop then goto Ret;
Result:=0;
exit;
Ret0:
inc(p);
Ret:
Result:=p-pStart;
end;
|