Monday, March 30, 2009

CasualConc quick bug fix

I just found a bug in CasualConc. When it opens kwic result in a new window, it crashes. If you use this function, please download version from the site. I think this was introduced when I made a few changes last time.

If you find any other bugs, please let me know. If they are minor and easy to fix, I'll try to fix them in a day or two.

CasualPConc more updates

Today, I learned at least one person in the world knows CasualPConc exists other than myself. I'm really glad that.

I added a few more features to CasualPConc today. Now almost all the functions I can think of and I wanted to add are there. I might add a function to export results if anyone is interested. Or if anyone has a good idea, I might consider that. But from now on, I'll focus on bug fixing and documentation. I'll update the CasualPConc page on the CasualConc main site in the coming weeks.

I got one request to make CasualPConc be able to handle more than two parallel corpora. But I think it's hard to add that function to CasualPConc. It would probably be easier to write a new program based on CasualPConc. I might work on this once I finalize CasualPConc and if I have time to focus on its development.

Anyway, if you happen to be reading this blog and are interested, please try it ang give me some feedback. Using basic functions should not be difficult. Or you can wait for a few days or weeks until I update the documentation (how to use).

Sunday, March 29, 2009

CasualPConc update

I'm almost certain no one has tried it yet, but I spent a little more time to add some more features to CasualPConc, a new parallel concordancer. This application is available at the CasualConc main site under Utility Programs, but the documentation is not up-to-date.

I don't think I'm going make this as fancy as CasualConc, but I'm trying to use much more RubyCocoa (or Cocoa) features (I'm learning...).

CasualPConc originally had just kwic and word frequency count features. Now it has word cluster and collocation features just as CasualConc. One specific feature to CasualPConc is finding keyword in the matched corpus after running kwic search. When you run kwic search, you have the matched portion of the matched corpus (paragraphs or sentences), which includes words that are equivalent or similar to the one you searched. CasualPConc goes through the matched portion of text and calculate keyness of words against the entire corpus. I'm not sure if I explain this clearly, but it's there, but I'm also not sure if this works as intended.

I also added stop word/skip character functions. My understanding is stop words are the ones that are very frequent in a language and eliminating them helps people see what they look for more clearly. You can create stop word lists for any number of languages or corpora. The skip characters function is for two-byte languages, like East Asian languges or more specifically Japanese because characters for period, comma, brackets, etc. in Japanese are not treated as such by regular expressions. They are treated as regular characters like alphabets and included in word lists and context words and they contaminate results. Both of these functions are experimental and not fully tested and they are separate at this moment, but I might combine them as a single function or list.

If you are interested, please try it and give me some feedback. I personally don't use parallel concordancer much and I don't have good parallel corpora, so I can't really test it. Any feedback is welcome (functionality, usability, bug report, etc.). The current version is 0.3, but this simple means I have made two major changes/enhancements with some testing and bug fixing since version 0.1.

Also if you use any of my applications, I'd really appreciate your feedback on them.

Thursday, March 26, 2009

A new project

As I didn't get much feedback and I was kind of busy, I didn't touch any of the programs (scripting) for a while. But I recently did small translation work and thought a parallel concordancer might help in that situation. So I spent the last few days to start a new project. It is a simple parallel concordancer for Mac OS X Leopard and I named it CasualPConc.

Currently it doesn't do much (possibly many bugs) and because I don't really use parallel corpora, I don't have a good idea about how to develop it. So I'd really appreciate any feedback. I used to work as a translator for a short period of time, so if I just follow my intuition, it will be more like a database program for a translator or language learner. The program is available on the CasualConc main site (direct link).

I don't expect many people use it, so if you give me feedback, it is likely that the functions you request will be added (as long as I can handle them). Please email me directly, or leave comment here, or post on the Discussion Board.