Sunday, October 25, 2009

CasualConc minor bug fix

I found a bug in CasualConc when I was working on the beta, which has the same script.  It's a minor one (a feature I believe not a lot of people use).

Bug fix
- crashed when creating a database file in Advanced Corpora Database mode using tag deletion.

If you are one of the rare people who uses this function, please download the latest version.

Monday, October 19, 2009

CasualConc 1.0 and more...

This post will a long one.

I've decided to make the latest build of CasualConc a version 1.0 mostly because I didn't get feedback/bug report (well, I guess not a lot of people ever tried the latest beta or people don't bother to report any bugs...).  Anyway, I made a few changes to it and here's the list.  Bug fixes are very minor

Casualconc version 1.0

Bug Fix
- timer for File Info now working
- move tables in Cluster moves everything including span and type
- word list import now functions

Enhancement
- creating File Info table is much faster
- progress bar in File mode progresses based on the number of files processed
- now including help files (the same content as you find on the site; English only)


From now on, this version is only maintained if I ever get bug report.  I've already started adding new features and I'm planning to make more fundamental changes to it.  I'll release it as version 1.x some time in the future, but no time frame.  If you have any feature request, please send them to me.  I'll try to add them once I add what I want, though whether I can add what you want depend on my time and scripting skills.  I might release it as beta (or alpha) if people are interested even before it becomes stable.  If you are, please let me know.


In addition to this, I've also updated some of other applications.  I don't know if anyone ever uses them, but I really started using some of them personally, so I've been trying to include what I want.  Well, the reasons I started to write the applications vary.  Some were upon requests and some were just experimental (to try what I learned in scripting).  Now, I want to make them more like real applications.

Anyway, the updated applications and the details of updates are below.  I don't think people are interested, but these are for the record, so I can keep track of what I've done.


CasualMecab

- Aozora Bunko Kanji substitute handling
- experimental Word List function using Mecab output
- Snow Leopard Support (separate build)

Aozora Bunko Kanji substitute handling is to pick the Kanji substitute, such as ※[#「てへん+劣」、第3水準1-84-77] and replace with real Kanji's (now this is possible with Unicode characters).  Word List function uses Mecab format output and create word list with any of the info available (not just with the word on the text).  You can create a word list of base form and part-of-speech combination, etc.  Snow Leopard Support is just a work around.  If you use Snow Leopard, you need to download the one for it.


CasualTagger

- support rbtagger if installed
- better regex search/replace
- delete punctuation tags
- ignore header section (info part?)
- skip bracketed tags from tagging
- progress bar in batch process
- run tagger in editor mode

CasualTagger now support rbtagger.  You can find more information about rbtagger hererbtagger is a tagger based on Eric Brill's tagger by Todd A. Fisher.  You need to install it by yourself, but it's very simple.  Just type sudo gem install rbtagger in Terminal.app.  It still has some issues, but it's good to have alternatives.

Regex replace was supported before, but now you can use it for search.  Delete punctuation tags delete tags put on punctuation characters (not words).  Ignore header section is for my own purpose.  Some of my corpus files have header section ~ and I don't want to add tags to the text in this section.  So now CasualTagger can ignore this part (keep the original text).  Skip bracketed tags are to ignore section tags I have on some files, such as ~, etc.  And the progress bar is added to batch processing.  Finally, you can apply tags (engtagger/rbtagger) on a single file in Editor mode.


CasualTextractor

- PDF mode
  - search in PDF
  - enlarge/reduce in PDF
  - go to selected text (from PDF to extracted text)
  - delete selected text (on PDF from extracted text)
  - split word search and replace (PDF artifacts)
  - replace character list

- Web mode
  - open web files (html/htm/webarchive)
  - clear open page/file
  - web history
  - source view

- Document
  - split word search (for PDF)
  - replace character list

Overall
  - open recent files
  - regular expression search
  - simple tagging support
  - text format options (replace certain text/characters)

I've made so many changes, so I make notes on some of them.

In PDF mode, with delete selected text on PDF, you select text on PDF view and delete the section.  The text will be deleted from the text view and the text on PDF will be struck through.  This is handy if you want to delete header or footer on the PDF text.  Split word search is to find words split when PDF file was created, such as interest- ing due to line break.

In Web mode, you can open web files (not just drag&drop) and clear the page to allow you drag&drop another file.  Web history is what you usually see in a browser, though it's limited.  You can see the source of the page and make changes to it (you can see the result of the change).

Overall, it has regex search/replace.


The information on the site is still based on the previous beta.  I probably won't update it until I'm certain the features are set.  But if you try and wonder how a function works, please feel free to contact me.  Also any bug report is welcome.