Thursday, October 9, 2008

Utility Programs

I updated a couple of utility programs, including changing names, and added a new utility program to the main site.

Two of the utilities are TextExtractor and jparser. These two names are already used for program/module names, so I decided to change them. I use the same Casual~ namings for them and TextExtractor is now CasualTextractor and jparser is now CasualMecab. Well, it's obvious I didn't spend much time on this... Both utilities had a few bug fixes and minor feature changes, which are not obvious.

And a new addition is a POS tagger. I finally found a English POS tagging module for Ruby. The module is EngTagger by Yoichiro Hasebe, which is a Ruby port of Perl's Lingua::En::Tagger module. I simply added GUI and a function to process multiple files. You can also select a tag type (the default of EngTagger is xml format). I named it CasualTagger and you can download it from the CasualConc main site (here's the direct link to the page: English/Japanese). To use CasualTagger, you need to install EngTagger, but it's very easy from Terminal.app (single line of command). For more information about EngTagger (tag sets, etc.), check this page. As always, any feedback is welcome.

Now I need to seriously think about adding tag-handling feature to CasualConc. But how?? If you have any suggestion, leave your comment on this blog or email me (email address is on the main site).

Also a couple of people asked for a parallel concordancer (for Mac and with Javascript). But what's the most fundamental functions? Any suggestion/comment about parallel concordancer is also welcome.

No comments: