Wednesday, October 22, 2008

CasualConc Update

I've been working on CasualConc and uploaded 0.9.9 to the site. Well, I'm not sure when to put 1.0, so I might go with 0.9.9.1...

The most of the changes I implemented are internal and many bug fixes. I didn't really touch the core tools, so most of the work was done on file handling.

Here are some of the changes you might (or not) notice:

Plain Text File encoding
You can now set a default text encoding on the File view (no need to open Preferences). You can specify a default encoding before you open files, which applies to all the files you add to the file list table. But now you can change them on the table. This means you can select different text encoding for each file. I also added ISO Latin 1 and ISO Latin 2 to the encoding list.

Open/Save panel
This change probably would not make difference to most of the people. I just wanted to change it to Genie panel(?) because I learned how to.

Collocation
Most frequent position for each context word is now colored in red.

Bug fixes
- Exporting/Saving results should work fine now, though I'm not sure how many people have ever tried to use this function. I also made changes to accommodate the changes in Collocation.
- Fixed crashes when you hit space (or may be with some other keys) on the blank table. You might have never done such a stupid thing, but I found this when I accidentally hit the space bar in Advanced Corpus File Handling mode.

Also I put a note on the site, but your preferences settings will be lost if you have used the previous versions. If you want to use it change the name of the preference file "com.apple.rubycocoa.CasualConcApp.plist" in your home -> Library -> Prefereces folder to "CasualConcApp.plist". Except for tag ignore settings, your preferences settings should be safe.

Along with this change, I also added ISO Latin 1 and ISO Latin 2 to the list of encodings (open/save) in CasualTextractor.

If you find any of these attractive or bothered by bugs, please try the latest version and let me know what you think about it. But reports are also welcome.

By the way, I haven't updated all the documentation yet. Some of them are quite old. I guess I have to find time and update them (or rearrange them). I read somewhere that Google is moving the content of Google Page Creatot to Google Sites. That might be a good time to update documentation.

Saturday, October 18, 2008

Utility Programs updates

I've been experimenting some Cocoa UIs and bindings, and adding features I learned to utility programs. I made minor changes to CasualTextractor and CasualMecab. I also made a little more changes and added some new functions to CasualTagger. Now I need to manually tag a lot of texts, so I'm trying to make it a tool to help manual tagging. I copied regular expression replace from CasualConc (hidden feature) and added simple tag coloring. Now it also has a simple word/tag count and kwic concordance of a single file.

I'm also working on CasualConc. As I wrote in the last post, I will probably clean up some old codes. Tag handling might take some time to implement because I have to think about how to handle tags in CasualConc. Any idea? What I'll probably do first is change/fix file handling elements. In utility programs, you can now change the character encoding of plain text files after you add them to the table. This will allow you to use text files with different encodings.

Another minor change is coloring in Collocation. Now the most frequent position for each context word will be colored in red. This looks working fine, so you will see this feature in the next update.

In addition to RubyCocoa programs, I wrote a simple javascript-based parallel concordancer, which I was asked to write. I based it on my old javascript-based concordancer, so not much scripting was involved. I did this because I'm thinking about writing a parallel concordancer for Mac, as I wrote on this blog before, so I wanted to know what are the most fundamental features for a parallel concordancer. I googled and based on some parallel concordancers out there, I wrote it. It only creates a table with matched texts based on the search. It also creates kwic results and you can select one to show the matching text. But what else is necessary for a parallel concordancer for Mac? If you have any suggestion, please leave your comment here or send me email or post on CasualConc Google Discussion Board. If you could give me enough information for me to figure out how to implement your requests, you will have a better chance to see them, though it also depends on my scripting skills.

Anyway, please check out the utility programs and let me hear what you think.

Monday, October 13, 2008

A few bug fixes

I made a few more bug fixes and some internal changes. The somewhat major but I fixed was aligning and coloring of text on Concord table and text view. If you use CasualConc only with English, you probably didn't see any problem, but if you deal with text with a lot of non-standard alphabet characters, the display was ugly. Now it's better (not perfect). There still is a problem with language that combine more than one characters to display one character on display. Other than that displaying text is less problematic.

The major internal change I made is using Shared User Defaults Controller to save Preference settings. This saved a lot of codes, but at the same time, this is not perfect. Somehow this doesn't remember that changes made by scripts, so for some text data, I have to use script to save the data properly. But I might have done something wrong, so if you find any bugs related to Preferences, please let me know.

I also made a major change to CasualTagger. CasualConc had hidden functions to help manual tagging, which I have never turned on officially. This was because I didn't have time to finalize/fix bugs, so I decided not to make it available. Now I took some of the features from it and added to CasualTagger. I haven't documented them, but I included a simple instruction in the application as a help file. If you are interested, please take a look at it. CasualTagger is on the main site under Utility Programs.

I also made more changed to IPATypist, which not a lot of people use. And I guess those people who have downloaded it might not read this, so they don't know if it's updated or not (though I'm not sure if they keep using it).

That's about it for now. I'm also thinking about adding tag handling features to CasualConc, but it doesn't look promising. I once wrote experimental scripts to handle some types of tags, but they don't work very well. Now if I want to seriously add this feature, I have to get rid of old ones first. It's not very easy... Also there are a lot of weird scripts in CasualConc because it includes some codes I wrote when I was just starting to learn Ruby. I guess I have to clean up old messes first before I add some significant new features.

In any case, if you use CasualConc and/or other utility programs, please let me know what you think. The current priority is adding tag handling features. East Asian language support might be dropped. It would be a separate program. Some people asked about parallel concordancer, so I might write a separate one for it, but I still don't have enough information to go ahead. If you'd like to see a parallel concordancer for Mac, please give me information. You can email me directly or make a comment on this blog or post on CasualConc Discussion Board. I need to know what are the most fundamental features and how they should be implemented.

Friday, October 10, 2008

IPATypist update

This is not related to corpus analysis, so I guess almost no one is interested, but as my memo, I write down what I did.

IPATypist is a very simple utility program to type IPA phonetic alphabets. With this update, I added a database function to it. Now phonetic transcriptions can be stored so you don't have to type them again. But this is mainly my experiment to use CoreData, an OS X framework to easily handle database type programs. I just started look at it, so I'm still not sure if I did this right or wrong, but it looks it's working.

If you happen to be one of rare people who are interested in this program, please check it out and let me know what you think.

Thursday, October 9, 2008

Utility Programs

I updated a couple of utility programs, including changing names, and added a new utility program to the main site.

Two of the utilities are TextExtractor and jparser. These two names are already used for program/module names, so I decided to change them. I use the same Casual~ namings for them and TextExtractor is now CasualTextractor and jparser is now CasualMecab. Well, it's obvious I didn't spend much time on this... Both utilities had a few bug fixes and minor feature changes, which are not obvious.

And a new addition is a POS tagger. I finally found a English POS tagging module for Ruby. The module is EngTagger by Yoichiro Hasebe, which is a Ruby port of Perl's Lingua::En::Tagger module. I simply added GUI and a function to process multiple files. You can also select a tag type (the default of EngTagger is xml format). I named it CasualTagger and you can download it from the CasualConc main site (here's the direct link to the page: English/Japanese). To use CasualTagger, you need to install EngTagger, but it's very easy from Terminal.app (single line of command). For more information about EngTagger (tag sets, etc.), check this page. As always, any feedback is welcome.

Now I need to seriously think about adding tag-handling feature to CasualConc. But how?? If you have any suggestion, leave your comment on this blog or email me (email address is on the main site).

Also a couple of people asked for a parallel concordancer (for Mac and with Javascript). But what's the most fundamental functions? Any suggestion/comment about parallel concordancer is also welcome.