Friday, December 18, 2009

CasualTagger 0.8

I'm not sure if there is anyone who have even tried CasualTagger, but because I've been using it to tag my own corpus, I've been adding features to it.  I don't have time to work on the documentation with screenshots, so I'll describe new features and how to use them (to some extent) on this blog.  But because I don't remember all the changes I've made since the last update, I only mention more significant ones here.

General feature
- memo panel

You can keep notes of what you are doing and CasualTagger keeps you memo.  The format of this might change in the future, though.

In Editor mode
- coloring of specified strings in kwic context (up to 2)
- specifying left/right kwic context span separately
- adding xml type tag in addition to pos-tags (with shortcuts)
- more versatile tag coloring
- search word/coloring string history

The coloring is simple.  You just specify word/phrase/whatever to add colors in kwic context.  You can specify if you add colors to left/right/both context.  The mode of search word (wildcard/character/regular expressions) applies to context string coloring.

For kwic span, you can now specify it for left and right context separately.

The xml tagging feature is added.  So now you can add two different types of tag formats (one for pos-tags and the other for xml type tags).  Both can be done with shortcuts.  For example, you can add pos-tags in _XXX format and at the same time, you can work on xml type tags ~.

Tag coloring is more versatile now.  You can specify different types of tags including xml type tags.

Search word and context coloring strings has history features just like search word/context word history in CasualConc.

New modes
- Item Counter
- Custom File Info

Item Counter is simply to count occurrences of strings in your corpus that match a regular expression.  To use this, first add files to the file list table on the left.  Then open Option panel (Menu -> Window -> Counter Option Panel).  You can specify the end of the file information part (just like you can in CasualConc).  You can also specify any strings to ignore in counting (with regular expression).  If the files have any string that match the specified regular expression, they will be deleted before CasualTagger counts what you want to count.  You can have any number of items on a table and check the ones you want to apply.  You can also specify them for each table.  If you use () to back reference, only those in the brackets will appear on the table.

Custom File Info is basically multiple Item Counter.  To use this, add files to the file table on the left.  Then click Settings button on the top right corner.  You can specify end of the file info part and any string to ignore in all the counts.

On the setting table, add items to count.  Label is a label for table columns.  Check "U" if you want to count only unique occurrences.  "C" is case sensitivity for regular expression.  Check "M" to allow multiple line matches.  Then specify items to count and items to ignore in regular expressions.  You can specify multiple items for Items to ignore.  Just use a comma [,] to separate regular expressions.  You can export/import that list for later use.  Drag and drop to change the order.

Once you set everything, close the setting window and click Run.  You can export the result as a tab-delimited text (in UTF-8) to open in Excel/Numbers or any spreadsheet application.

You can use Item Counter to check how your regular expressions work in Custom File Info, though it's not perfect (it doesn't have ignore case/multiple line for ignore items in Item Counter.

Anyway, I don't know if there's anyone to try CasualTagger, but if you are interested, you can download it from the CasualConc site.  If you ever use it, please let me know what you think.