PHP DM Gate v0.9 » KOLStrList

Jon © (08.07.10 20:59) [0]

I'm not sure if I am using this correctly or if it is a bug. This is what I am doing:

program Test;

uses KOL;

var StrList: PKOLStrList;

begin
StrList := NewKOLStrList;
StrList.LoadFromFile('Test.txt');
MsgOK(StrList.Text);
StrList.Free;
end.

I have ASCII text file names Text.txt with these contents:

1 This is line one
2 This is line two
3 This is line three
4 This is line four
5 This is line five

Without UNICODE_CTRLS it works fine.
With UNICODE_CTRLS I get garbage.

My understanding is that PKOLString can be used in both scenarios. Am I wrong?

<Цитата>
Vladimir Kladov © (08.07.10 21:49) [1]

In both only if for UNICODE_CTRLS text is coded as UNICODE in file and as ANSI for not UNICODE_CTRLS. If text is coded the same way, use corresponding StrList or WStrList. To check if text is unicode, open it as a file, read first two bytes, check if it contains $FEFF. This way is not applicable if the file was saved using non-windows text editor (without notepad). For this case try open it as a long string and find there some $00 bytes. At least CR/LF characers in UNICODE16 are coded like $13, $00 and $10, $00.

<Цитата>
Jon © (08.07.10 23:26) [2]

Thank you for the explanation, it makes sense. I was under the understanding that KOL checked for that itself internally - I see that I am mistaken now. Is there a routine built within KOL that would determine (or give a best guess) if a string is Unicode or non-ANSI?

<Цитата>
Vladimir Kladov © (09.07.10 04:48) [3]

This depends on a file creator, so there is no sense to do it automatically. You may compare it with VCL or other languages. File may be coded very different ways, and even unix-like text file is not detected automatically, you should suppose it at windows/dos or handle its format additionally.

At least I suggested check procedure in my previous post. It can be completed with other checks if you want but all depends on your task and possible input data.

<Цитата>
Jon © (09.07.10 05:14) [4]

Thank you. I shall do as you advised and will check the input first as per your recommendations.

I do think KOL would benefit from a built-in routine anyway - there is IsNAN so why not add IsUnicode, IsWide, Is...? It's just a suggestion to improve the library features.

<Цитата>
Vladimir Kladov © (09.07.10 10:58) [5]

Not a problem: if a universal method exists which can distinct unicode text from non-unicode in most cases. Do you know such method? I don't. (And I think there is no such method, otherwise my lovely browser would not contain options to switch decoding for a web-page in the menu).

<Цитата>
Jon © (09.07.10 22:11) [6]

OK, I see your point. I shall investigate available methods and I may suggest suitable routines. Probably an "IsASCII" that checks for non-ASCII characters may be best.

<Цитата>