Friday, April 18, 2008

CasualConc update

With recent discoveries of file handling on Objective-C side (or RubyCocoa side?), I decided to add a new feature to CasualConc. Originally, CasualConc was able to handle UTF-8 or ASCII encoded files because I used Ruby's file handling method. Now I switched to Objective-C methods, so it can handle a few more encodings. Added encodings are UTF-16, Windows Latin 1, Windows Latin 2, Mac Roman, Shift-JIS, EUC-JP, and ISO-2202 JP. The last three encodings are all for Japanese. These are limited to the ones Objective-C can handle by default. I wasn't able to find Chinese or Korean encoding settings, so they are not included. But CasualConc cannot handle 2-byte character properly (in Concordance), so this shouldn't be an issue. I haven't really tested all the encodings, so if someone happens to find this blog and would like to try, let me know. I might add a link to download page to the CasualConc site soon (hopefully).

Also experimentally implemented is support of other file formats. This still returns error from time to time. Personally, all my corpus files are in text, so this is not for myself, but I just thought someone might be interested. The problem is no one is checking this blog or CasualConc site, so I highly doubt anyone even uses this function. By the way the added file formats are .doc, .docx., .rtf, .rtfd, and .pdf.

Well, I really hope someone would try to use CasualConc, though...

No comments: