Saturday, April 26, 2008

A minor update

As I reported a couple of days ago, at least a few people around the world are testing CasualConc and I've already got a report of a bug... This very minor update is partly based on the report, which I don't really know the source of, and a minor change to a setting.

What I found was an inconsistency of handling special characters, such as curly quotes or curly apostrophes. These look like a single byte character on web pages or Word documents, but in fact two-byte characters in Unicode (UTF-8), so CasualConc replace them with a single byte quote or apostrophe. Recently I added .doc and .pdf support, and these documents often contain their own special characters (arrows, etc.). I only replaced these in some parts and not others, which caused inconsistency. Now I think I applied the same rule to all the tools, but I'm not sure.

Another change is drop of ASCII mode in concordance. In the Concordance Preferences, CasualConc has 4 ways to handle texts. Originally the two European Language supports are ASCII and with Acccented Characters. The former assumed the corpus files do not contain any multi-byte characters (in UTF-8). The latter only assumed a few accented (multi-byte) characters in a context (the context words shown in the concordance result table). But then I realized after the very first person downloaded CasualConc that he uses Greek, which, I think, uses full of multi-byte characters in UTF-8. So the new two modes for European Languages are A and B. A is the same as the previous with Accented Characters mode and B is for full multi-byte character languages, but still assuming not many 3-byte characters used in East Asian Languages. If the text contains many 3-byte characters like East Asian Language characters, like Japanese, which are 2-byte characters on the screen but processed as 3-byte characters in UTF-8, CasualConc might not be able to display concordance result or full context view properly. If there are any languages that have full of 2- and 3-byte characters, let me know. I'll see what I can do.

By the way, I decided to add 'Getting Started' section on the CasualConc site. The current 'How to use' is more like a manual or lists of functions CasualConc has, so it's not really how-to. The site only has basic file handling or 'how to select files for your analysis' type entry. I'll try to add more when I can find time.

Anyway, the current version is 0.9.4. If it is up to 0.9.9 and still not ready for version 1, I might go for 0.9.9.1..., but if enough people use it and does not have major problem, I might put it as Version 1.

No comments: