CasualConc - a concordancer for macOS

Tuesday, April 29, 2008

Documentation

Over the last couple of weeks, I've worked on the documentation of CasualConc. Now, it covers most of the basic functions. I also started more step-by-step instruction with a lot of images and named it Getting Started with CasualConc. So far, I have only finished basic file management and database creation along with the kwic concordance function, which I think will be the most frequently used function (I do).

Now, my only hope is someone will find the site or this blog and start using it. Somehow, I can't search the CasualConc main site on Google. It doesn't show up in the result. When I add a post to this blog, it shows up in the next 20 hours or so and disappears. Well, maybe I should add one post per day until some more people find this blog and CasualConc.

If you happen to find this blog, please try it (if you use Leopard) or tell your friends who uses Leopard to try it. I know it still has bugs and a lot of limitations, but I really want other people's opinions to improve it (it serves most of my current uses, so I don't have much motivation to make a lot of changes). Well, even if I hear from people, I might not be able work on it for a while, but at least it's good to hear esp. if people like it.

Monday, April 28, 2008

A quick fix

I'm almost certain nobody has downloaded CasualConc since my last post. But anyway, accidentally, I introduced a bug to database creation function. This was caused by implementing a new tag-deletion code, which I forgot to apply to database creation part. So if you find the database creation function does not work properly (this crashes CasualConc), please go to the site and download the latest beta.

If you find any other bugs, please report it to me. The email address is on the main site. Or you can leave a comment on this blog.

Thanks!!

Sunday, April 27, 2008

A few more fixes

I found a few minor bugs which I introduced with the last changes, so fixed them. And I also found that the default font of CasualConc, Courier, is not monospace in Greek, which is the language my very first user tests (I guess) on CasualConc, so I added a function to select Courier or Courier New, which is monospace in Greek.

I didn't mention this in the last post, but I also made some changes to the codes of Concord, which only improved speed about 2-3%.

Now I hope more people find this blog or the main site and test CasualConc. So if you happen to find this blog or the main site and you know someone who uses Leopard and is interested in corpus analysis, please tell him/her to test CasualConc, even if you don't use Mac OS X Leopard. If you do, please try it!!

Saturday, April 26, 2008

A minor update

As I reported a couple of days ago, at least a few people around the world are testing CasualConc and I've already got a report of a bug... This very minor update is partly based on the report, which I don't really know the source of, and a minor change to a setting.

What I found was an inconsistency of handling special characters, such as curly quotes or curly apostrophes. These look like a single byte character on web pages or Word documents, but in fact two-byte characters in Unicode (UTF-8), so CasualConc replace them with a single byte quote or apostrophe. Recently I added .doc and .pdf support, and these documents often contain their own special characters (arrows, etc.). I only replaced these in some parts and not others, which caused inconsistency. Now I think I applied the same rule to all the tools, but I'm not sure.

Another change is drop of ASCII mode in concordance. In the Concordance Preferences, CasualConc has 4 ways to handle texts. Originally the two European Language supports are ASCII and with Acccented Characters. The former assumed the corpus files do not contain any multi-byte characters (in UTF-8). The latter only assumed a few accented (multi-byte) characters in a context (the context words shown in the concordance result table). But then I realized after the very first person downloaded CasualConc that he uses Greek, which, I think, uses full of multi-byte characters in UTF-8. So the new two modes for European Languages are A and B. A is the same as the previous with Accented Characters mode and B is for full multi-byte character languages, but still assuming not many 3-byte characters used in East Asian Languages. If the text contains many 3-byte characters like East Asian Language characters, like Japanese, which are 2-byte characters on the screen but processed as 3-byte characters in UTF-8, CasualConc might not be able to display concordance result or full context view properly. If there are any languages that have full of 2- and 3-byte characters, let me know. I'll see what I can do.

By the way, I decided to add 'Getting Started' section on the CasualConc site. The current 'How to use' is more like a manual or lists of functions CasualConc has, so it's not really how-to. The site only has basic file handling or 'how to select files for your analysis' type entry. I'll try to add more when I can find time.

Anyway, the current version is 0.9.4. If it is up to 0.9.9 and still not ready for version 1, I might go for 0.9.9.1..., but if enough people use it and does not have major problem, I might put it as Version 1.

Thursday, April 24, 2008

Old stuff

Now I learned I can add an html page with javascript to the Google Page Creator site by simply uploading it as a file and link to it, I added an old javascript-based concordancer/word counter to the CasualConc site. This is probably useless for people and I'm not sure if I need it on the site, but I just wanted to keep it somewhere and because this old script is the basis of CasualConc, I think it's the right place (for me).

I wrote this script about 2 years ago when I was playing with javascript. At that time, I wanted to learn javascript, which I just started to learn a few months before that. I only knew MS-BASIC before that. When I started to play with javascript, I figured the best way to learn it is to write something with it. I first wrote s few scripts for my colleague at the work to save a repetitive task. Then, I wanted to do something for my self. I always wanted to do something with corpus linguistics. I found a few sites that did it with javascript and many scripts in Perl. With trial and error, and a lot of revisions, this javascript page was written. The page says it's version 6, but the script version is 61 (its on the file name of the script file). But then I learned limitations of javascript as a tool for corpus analysis. Then I tried Perl because that seemed to be what everyone used (and a lot of people are using it for text analysis), but somehow, it didn't appeal to me (or I wasn't/isn't smart enough to learn it). Then a year later, I used Ruby for something at work and somehow I liked it (still like it). I didn't know Python, which I learned when I was learning Ruby. Another big plus was that because Ruby was originally and is still developed mainly by Japanese people, I found a lot of documents in Japanese. This and the inclusion of RubyCocoa in Leopard is why CasualConc exists now. I think I wrote something like this in the very first post on this blog, but anyway, it's fun to use Ruby though my scripts are still primitive. I hope I can learn more about Ruby and improve CasualConc. What I want is time, but now I need to spend more time on other more important stuff...