Friday, May 30, 2008

A minor change

I found a bug (sort of) in Concord a couple of days ago. It's a minor bug and this happens only when you use the database mode in Concord. Well, it's more of memory leak. I implemented a forced garbage collection when full text is displayed in the context view of Concord, but somehow memory is not released. So I changed the way to read the text from a database file. Now it should not keep using additional memory when you select a different concordance line to show full text.

I use the same technique to read data from a database file when CasualConc searches a string, but if I implemented the same change to the search function, it used more memory because the search returns more hits. What this means is if you search word(s)/phrase(s) in any of the tools many times, CasualConc keeps using memory. I haven't tested if it uses up all the available memory and starts using virtual memory or if Ruby starts GC when it uses up all the available physical memory. In any case, until I can find a way to solve this problem, you might want to quit CasualConc after a while and restart it.

Tuesday, May 20, 2008

A few more fixes again

I fixed a few more bugs this weekend that are mainly related to lemmatization and collocation statistics. I also added some more documentations to the main site (some in Japanese). The latest version is still 0.9.7 but the date is 05192008.

Now, most of the features I wanted to include in CasualConc is there and mostly functioning. I don't have time to improve Japanese kwic feature now, so that should wait until sometime in summer or fall. And unless I find or someone reports any major bugs, I will try not to spend too much time on this for a while. I don't know how many people actually downloaded CasualConc and are using it, but I guess there aren't many. If you happened to be one of them, I'd like to hear what you think about it.

Well, I might need to publicize this a bit more, so I might start trying to get more beta testers somewhere.

Sunday, May 18, 2008

Another bug fix and minor update

I highly doubt if anyone downloaded CasualConc recently, but anyway, I found a few bugs and also make some changes, which I wanted to for a while. Now the latest version is 0.9.7 beta.

First, I found a bug in Japanese Concordance, which I'm sure nobody has ever used. When I dropped the text only mode, I forgot to change it in Japanese concordance mode. Now it should be fixed. I also fixed some other bugs what relate to the recent feature changes.

The changes I made are mainly with Collocation. Now, if you search for multiple words or use wildcard search and multiple words are found, collocation information will be displayed for each keyword. This change affected statistics calculation, so I think I made necessary changes to it.

I also made a minor change to Export result function of Concord. Originally, an exported CSV file from Concord only include kwic results and file paths. Now it has an option to include context words (L5 - R5). To include them, go to Preferences -> Concord and check the box Include context words (L5 - R5) in CSV output.

As always, if you happened to find this blog or the main page, and tried CasualConc, any feedback (including bug reports) is welcome. Especially if you find it useful, I'd like to know.

Thursday, May 15, 2008

Another quick fix

I found a minor (maybe major to someone if anyone ever uses CasualConc) bug and fixed it today. This only affects you if you use Concord with non-plain text files as your corpus files. And this only happens in the paragraph mode (the default mode). Now you should be able to use other file types as your corpus files with Concord and in the paragraph mode.

I found this bug when I was testing .odt files. After the fix, I was able to use .odt files as corpus files, so this confirms CasualConc can read .odt files!

I don't know how many people are affected (I know not many people), but if you downloaded CasualConc in the last couple of weeks, please go to the site and download the latest version. It has the same version number (0.9.6), but different date (05142008).

And if you find any other bugs, please let me know. The email address is on the main site.

Tuesday, May 13, 2008

Some details of last update

As I mentioned in the last post, I added/activated a couple of new features on CasualConc. One is based on the lemmatizing function and the other is something with Concord.

The first, which is based on the lemmatizing function is keyword grouping or whatever name I will settle (it has a tentative label). What it does is first you prepare a text file (UTF-8) with the same format the lemmatizer accepts. The default is:

keyword -> word,word,word,...

The keyword is a grouping label, so if you want to group days of a week, it looks like:

week -> Monday,Tuesday,Wednesday,Thursday,Friday,Saturday,Sunday

Once you prepare as many groups you want to have, save the file as a plain text with UTF-8 encoding. Then, go to Preferences on CasualConc -> Lemma, and check Grouped Keywords. Next you select the file you just saved by clicking Select Grouping File button. Now everything is set.

If this works as intended, you should be able to use this function on Concord, Cluster, and Collocation/Cooccurrence. What you will do is add @@ at the beginning of your search word(s). So if you want to search all the days of a week, as specified above, you will @@week, then search. You should be able to search all the words in this group. Technically, you should be able to search multiple groups, but it is not fully tested and might not work, and I don't know what will happen if you combine this feature and wildcard search. I might change the behavior of this feature if I ever get any feedback.

Another somewhat major addition is which is not documented at this time is a function for Concord. You can now open a concordance result in a new window. This might be useful if you want to compare several concordance results. To use this function, search any word(s) in Concord, and then go to Menu -> Misc -> Open Concordance Result in New Window. This is experimental. I added this because I found a way to add multiple window function to a program (I just wanted to have something so that I remember how to do it). You should be able to resort the results even on a new window, just like on the main window. But be ware, if the concordance result is huge (like returned 10000 hits), using this might eats a lot of memory because CasualConc keeps all the info on memory. If you have at least 2GB of memory, this should be less of a problem, though.

Finally, I have something that is not related to CasualConc. I posted a couple of weeks ago that I wrote a simple utility program that helps typing IPA characters. I wrote a similar program(?) with Javascript and added to the IPATypist page. I highly doubt many people read this post and especially people who don't use Leopard, but this is written for those people. It should run on Tiger with Firefox, Safari and Camino. I haven't tested it on IE on Windows and I have no intention to support it, but it might work. It is also available for download, so if you are ever interested, you can download it and use it on your computer or put it on your course site or wherever you want to use it, though I can't guarantee it will work.

As always, if you ever use any of the programs, I'd apprecite your feedback. That will motivate me to improve them.

Tuesday, May 6, 2008

Minor update

Over the weekend, I fixed a few bugs and made a few changes to some of the existing functions of CasualConc. But these changes might have introduced another bugs... Now CasualConc is 0.9.6 (beta).

The bugs or more precisely, legacy features, that were fixed or updated were mostly on file handling. When I first started writing the program, I didn't know anything about RubyCocoa (or Cocoa). So when corpus files/database file were selected, only one file/folder was selectable. This was simply because of the original Ruby script. In that script, I simply specify a directory or a file to analyze in the script. And I simply added Cocoa interface to it. Eventually, I learned how to receive multiple file names as an array from the open panel, I made it available to some of the new features. Now you should be able to select multiple files/folders when you choose your corpus files/folders in File Mode. In Database Mode, only one file can be selected. If you need to select multiple database files, please use the advanced file handling mode.

Another bug fix was drag and drop of files. I'm not sure if I mentioned this feature in any of the documentation, but you can actually drag and drop files to the table in File View. So if you have a Finder window open with files you want to add, you can drag and drop them (or the folder that contains the files) to the table. This should work with files to analyze in File Mode and files to add to a database in Database Mode. If these don't work, please let me know.

Other minor changes are too minor and I'm almost certain no one has ever used it. But anyway, I dropped Text Only mode. So now all you need to do is check/uncheck the file types you want to use. You still need to specify text encoding if you use text files. This is because the auto-detection of text encoding in Objective-C is not usable. Related to this change is the addition of OpenDocuemnt Text (.odt) support. But because I've never used Open Office, I haven't tested it. I implemented this a while ago when I added others but didn't activate it because I don't use it. And now I decided to activate it. I simply use a bulit-in Objective-C function, it should work as other files do (no guarantee).

Oh, one, kind of, major fix is the lemma function. I implemented the lemmatization function at a very early stage. But I've made a lot of changes to most of the tools since then, so it seemed like I broke it. Now it's fixed and I also added a function to use lemma grouping in kwic search. I mean, if you turn on this feature and search a word that is on the lemma file you provide, you can search all the words grouped under the same lemma, though I'm not sure if this works as intended.

In addition to these mostly fixes, I added a couple of new features. One is based on the lemmatization function and the other is something very experimental. But this post is getting long, so I'll post them in the next few days when I have time.

As always, if you happen to find this blog or the CasualConc site, please let me know what you think. You can leave your comment on this blog or email me (the address is on the main site).

Saturday, May 3, 2008

IPATypist

This is a small utility program I wrote for an ESL instructor at my school (yes, this is just written for you, Janet!), but I made a few changes to it so that this can be also useful for other people.

Originally, she told me she was having trouble typing IPA phonetic alphabets in Unicode. There is a keyboard mapping to type phonetic alphabets, but it is cumbersome. So I simply put a lot of buttons to enter IPA characters. Because this was written specifically to serve her purpose, the characters are the ones used for English and some special ones that are used for the book she and her colleagues are working on.

What you can do with this utility is type phonetic alphabets by simply clicking buttons. Once you are done, copy/paste them to whatever the program you are working on. You can either go to Menu to copy or Command + C to copy the string, which keeps all the font information (font type/size). Or you can click Copy button, which only keeps the character information, so when you paste the string, whatever the font setting (type/size) on your document will be applied.

The latest version (0.3) supports key (button) mapping (if it functions as intended). Now, any character can be assigned to any of the buttons, so users/teachers of languages other than English could use it (I hope).

The system requirements are the same as all the program/utilities I wrote: Mac OS X 10.5.2 (Leopard) or later. I think this works on 10.5, but now all my machines are running 10.5.2, so I can't check the version prior to this (but at least I'm sure this won't run on Tiger). You also need Doulos SIL font, which can be downloaded freely from the SIL site. The link to their site is on the download page of this utility.

If you find any bugs or have any feature request, I will try to fix them/add them as much as I can (if they are minor). I don't have time to spend much time on this now (or I should say I should not spend time on this). But any feedback is welcome. Especially, I'd like to know if this helps someone.