-
Jon © (08.07.10 20:59) [0]I'm not sure if I am using this correctly or if it is a bug. This is what I am doing:
program Test;
uses KOL;
var StrList: PKOLStrList;
begin
StrList := NewKOLStrList;
StrList.LoadFromFile('Test.txt');
MsgOK(StrList.Text);
StrList.Free;
end.
I have ASCII text file names Text.txt with these contents:
1 This is line one
2 This is line two
3 This is line three
4 This is line four
5 This is line five
Without UNICODE_CTRLS it works fine.
With UNICODE_CTRLS I get garbage.
My understanding is that PKOLString can be used in both scenarios. Am I wrong? -
In both only if for UNICODE_CTRLS text is coded as UNICODE in file and as ANSI for not UNICODE_CTRLS. If text is coded the same way, use corresponding StrList or WStrList. To check if text is unicode, open it as a file, read first two bytes, check if it contains $FEFF. This way is not applicable if the file was saved using non-windows text editor (without notepad). For this case try open it as a long string and find there some $00 bytes. At least CR/LF characers in UNICODE16 are coded like $13, $00 and $10, $00.
-
Jon © (08.07.10 23:26) [2]Thank you for the explanation, it makes sense. I was under the understanding that KOL checked for that itself internally - I see that I am mistaken now. Is there a routine built within KOL that would determine (or give a best guess) if a string is Unicode or non-ANSI?
-
This depends on a file creator, so there is no sense to do it automatically. You may compare it with VCL or other languages. File may be coded very different ways, and even unix-like text file is not detected automatically, you should suppose it at windows/dos or handle its format additionally.
At least I suggested check procedure in my previous post. It can be completed with other checks if you want but all depends on your task and possible input data. -
Jon © (09.07.10 05:14) [4]Thank you. I shall do as you advised and will check the input first as per your recommendations.
I do think KOL would benefit from a built-in routine anyway - there is IsNAN so why not add IsUnicode, IsWide, Is...? It's just a suggestion to improve the library features. -
Not a problem: if a universal method exists which can distinct unicode text from non-unicode in most cases. Do you know such method? I don't. (And I think there is no such method, otherwise my lovely browser would not contain options to switch decoding for a web-page in the menu).
-
Jon © (09.07.10 22:11) [6]OK, I see your point. I shall investigate available methods and I may suggest suitable routines. Probably an "IsASCII" that checks for non-ASCII characters may be best.