Sunday, November 8, 2009

CasualConc update and CasualTranscriber

Well, I stated here that there would be no more new feature in version 1.0, but I changed my mind...  This is due to a couple of reasons.  One is I found a couple of bugs and I had to fixed them.  Another is adding them was not so time consuming.  I added to the current beta first and mostly simply copy/paste the items and scripts to version 1.0.  Especially one of the new features is what I wanted to include in version 1.0, but hadn't figured out how.

Anyway, the current version is 1.0.2 and here's the list of bug fixes and added features.

CasualConc Version 1.0.2

Bug fix
- In Cluster, the same cluster was counted twice if a search word/phrase appears twice in a cluster (such as 'is that is')
- Related to the above one: in Cluster, if a search word appears twice in a cluster, only one word was colored
- In Cluster and Collocation with Lemma search is on, not all words with the same lemma appear in the list (right most column)

The bug in Cluster was a little serious.  In the original script, clusters were collected on every search word.  This means that if there is a sequence 'is that is' and you search 'is', 'is that is' was counted twice (once with the first 'is' and then with the second 'is' in the cluster).  Now, when this happens duplicates are eliminated, so 'is that is' is counted only once.  And because it was assumed the search word appears only once in a cluster, only one of them was colored.  Now if there are two occurrence of the search word in a cluster, both should be colored.

The bug with Lemma is a minor one, which just means I don't expect many people use this function.  When you run Cluster or Collocation with the Lemma feature on, CasualConc shows the frequencies of words in the same lemma (or clusters with them) on the very right column.  But when Lemma contains only one word, it didn't display correctly.  Now all the words should be displayed.


New features
- Concordance Plot -- you need to set 'Scope of Context' to 'File' and check 'Create Concordance Plot'
- Search word/context search word history

Concordance Plot shows where in a file the search word appears.  The plots are generated when you run Concord with 'Scope of Context' set as 'File' and 'Create Concordance Plot' is checked in Preferences.

Another new feature is search word/context search word history.  CasualConc remembers the words you searched in Concord/Cluster/Collocation and you can select one from the pull-down menu.  You can set how many search/context search words to remember in Preferences (in General [Search Word] and Concord [Context Word]).

I hope these new features didn't introduce new bugs.


I also updated CasualTranscriber, a transcription helper application.  Now it should run on Mac OS X 10.6 Snow Leopard.  I also fixed some bugs and added new features.  It should be a little more stable.

The new/enhanced features are insert tag function and much more powerful regular expression search/replace.

To use tag function, go to Menu -> Window -> Tag Panel.  Then enter tags under Tag.  To insert a tag, type Command + Option + (the number on the left).  So to insert the tag in 1, type Command + Option + 1.  If Tag is laugh, the tag will be inserted as .  If any text is selected on the text view, it appears between the tags.  If you check the box and enter options, they appear like attributes in XML.  Options will be divided by a comma (,) so if you enter option1,option2 in the Options box, the inserted tag will be .  The selected text appears between tags.

If you don't want to type the combination, you can click a button to insert a tag.  On the main window, you will see a button on the top right corner (the one to show tool bar on any Cocoa application).  Click it to show an icon like a gear.  Clicking the gear icon shows a drawer on the right.  Click a button next to the tag you want to insert.  You can change the tag on the drawer (the change will be reflected on the panel).  But if you want to change the options, you need to do that on the panel.

I fixed some bugs, but I can't tell which ones.  This is because so many things broke in Snow Leopard and I couldn't tell which bugs were in the previous version and which ones were due to change in the OS.

Anyway, if you try either of them and find any bug, please let me know.

Sunday, November 1, 2009

Another CasualConc bug fix

In the development of the next version, I found a bug which affects the version 1.0 and fixed it.

Bug
- the context text (in context view) is not properly displayed when the search word appears in the first paragraph of a file.

This only affected if you use Database mode and the searched word appears in the very first paragraph of a file (original file, not the database file).


The development of the next version is slow.  Because I started to change the fundamentals, only a part of functions work now (file management and a part of Concord).  Anyway, here's the list of tentative new/revised features (might not show up in the next version).

- stop word list
- skip character list
- experimental pos tag search/count (only in European Language modes)

If you have any suggestions, please let me know.  I'll see if I can include them in the next version.

Sunday, October 25, 2009

CasualConc minor bug fix

I found a bug in CasualConc when I was working on the beta, which has the same script.  It's a minor one (a feature I believe not a lot of people use).

Bug fix
- crashed when creating a database file in Advanced Corpora Database mode using tag deletion.

If you are one of the rare people who uses this function, please download the latest version.

Monday, October 19, 2009

CasualConc 1.0 and more...

This post will a long one.

I've decided to make the latest build of CasualConc a version 1.0 mostly because I didn't get feedback/bug report (well, I guess not a lot of people ever tried the latest beta or people don't bother to report any bugs...).  Anyway, I made a few changes to it and here's the list.  Bug fixes are very minor

Casualconc version 1.0

Bug Fix
- timer for File Info now working
- move tables in Cluster moves everything including span and type
- word list import now functions

Enhancement
- creating File Info table is much faster
- progress bar in File mode progresses based on the number of files processed
- now including help files (the same content as you find on the site; English only)


From now on, this version is only maintained if I ever get bug report.  I've already started adding new features and I'm planning to make more fundamental changes to it.  I'll release it as version 1.x some time in the future, but no time frame.  If you have any feature request, please send them to me.  I'll try to add them once I add what I want, though whether I can add what you want depend on my time and scripting skills.  I might release it as beta (or alpha) if people are interested even before it becomes stable.  If you are, please let me know.


In addition to this, I've also updated some of other applications.  I don't know if anyone ever uses them, but I really started using some of them personally, so I've been trying to include what I want.  Well, the reasons I started to write the applications vary.  Some were upon requests and some were just experimental (to try what I learned in scripting).  Now, I want to make them more like real applications.

Anyway, the updated applications and the details of updates are below.  I don't think people are interested, but these are for the record, so I can keep track of what I've done.


CasualMecab

- Aozora Bunko Kanji substitute handling
- experimental Word List function using Mecab output
- Snow Leopard Support (separate build)

Aozora Bunko Kanji substitute handling is to pick the Kanji substitute, such as ※[#「てへん+劣」、第3水準1-84-77] and replace with real Kanji's (now this is possible with Unicode characters).  Word List function uses Mecab format output and create word list with any of the info available (not just with the word on the text).  You can create a word list of base form and part-of-speech combination, etc.  Snow Leopard Support is just a work around.  If you use Snow Leopard, you need to download the one for it.


CasualTagger

- support rbtagger if installed
- better regex search/replace
- delete punctuation tags
- ignore header section (info part?)
- skip bracketed tags from tagging
- progress bar in batch process
- run tagger in editor mode

CasualTagger now support rbtagger.  You can find more information about rbtagger hererbtagger is a tagger based on Eric Brill's tagger by Todd A. Fisher.  You need to install it by yourself, but it's very simple.  Just type sudo gem install rbtagger in Terminal.app.  It still has some issues, but it's good to have alternatives.

Regex replace was supported before, but now you can use it for search.  Delete punctuation tags delete tags put on punctuation characters (not words).  Ignore header section is for my own purpose.  Some of my corpus files have header section ~ and I don't want to add tags to the text in this section.  So now CasualTagger can ignore this part (keep the original text).  Skip bracketed tags are to ignore section tags I have on some files, such as ~, etc.  And the progress bar is added to batch processing.  Finally, you can apply tags (engtagger/rbtagger) on a single file in Editor mode.


CasualTextractor

- PDF mode
  - search in PDF
  - enlarge/reduce in PDF
  - go to selected text (from PDF to extracted text)
  - delete selected text (on PDF from extracted text)
  - split word search and replace (PDF artifacts)
  - replace character list

- Web mode
  - open web files (html/htm/webarchive)
  - clear open page/file
  - web history
  - source view

- Document
  - split word search (for PDF)
  - replace character list

Overall
  - open recent files
  - regular expression search
  - simple tagging support
  - text format options (replace certain text/characters)

I've made so many changes, so I make notes on some of them.

In PDF mode, with delete selected text on PDF, you select text on PDF view and delete the section.  The text will be deleted from the text view and the text on PDF will be struck through.  This is handy if you want to delete header or footer on the PDF text.  Split word search is to find words split when PDF file was created, such as interest- ing due to line break.

In Web mode, you can open web files (not just drag&drop) and clear the page to allow you drag&drop another file.  Web history is what you usually see in a browser, though it's limited.  You can see the source of the page and make changes to it (you can see the result of the change).

Overall, it has regex search/replace.


The information on the site is still based on the previous beta.  I probably won't update it until I'm certain the features are set.  But if you try and wonder how a function works, please feel free to contact me.  Also any bug report is welcome.  

Monday, September 28, 2009

CasualPConc 0.7 and a new application

I made a few changes and fixed a few bugs on CasualPConc, a simple parallel concordancer for Mac OS X.  It is a little more stable now.  I also worked on the documentation.  Now it covers most of the features.

A new application is based on CasualPConc.  When I first released CasualPConc, someone asked if I would make it to handle more than two corpora.  This is kind of my answer to that.  CasualMultiPConc has limited features, but it can handle up to 5 parallel corpora. 

This new application simply does kwic concordance of up to 5 parallel corpora.  The future of this application is up to users (if there's any).  I don't have any experience in parallel corpus analysis and I don't need to use it right now, so I'm not even sure how well this works.  I only use small corpora to test it.  If you are interested in testing it, any feedback is welcome.  This one is also for Mac OS X 10.5 Leopard or later, though I only tested it on 10.6 Snow Leopard.

This application and CasualPConc are only on English site (Main Site link on the right).  Both of them are under Other Applications.