Thursday, November 20, 2008

New Utility

I started another experiment and started to write a new utility program. It's named CasualTranscriber. It's a simply utility program to assist transcription of movie/sound files (text). I googled and found some good free ones, but because I wanted to learn a bit about QTKit (QuickTime Kit) with RubyCocoa, I wrote it. It's very simple at the moment and at very early stage of development. But if you are interested, please check it out and let me know what you think. If enough people are interested, I will develop it further. Here's the direct link.

As I wrote in the previous post, I started a new CasualConc site on Google Sites. And I want to introduce it here, though only English site is available at this moment (no Japanese site yet). All the program files are still on the old site, but the new site has much more information especially about utility programs (with screenshots). You can access the new site from this link. The CasualTranscriber page has much more information. So please check it out there.

As always, I'd appreciate any feedback on any of the program and also the sites. If you have any suggestion/comment/bug report, please leave comment on this blog, add new topic to the Discussion Board, or send email to casualconc (at) gmail.com.

Wednesday, November 12, 2008

Bug fixes

As you might have noticed, the current CasualConc site is hosted on Google Page Creator. But Google decided to stop this service and focus on Google Sites. So I've been transferring site contents to a new site on Google Sites. In the process, I've been updating the content and adding a little more information to some pages. So far, I've created the English site, but haven't started the Japanese site. I'm not sure how many people prefer to have Japanese page, but I will eventually create a Japanese site (I personally prefer to reading in Japanese).

While I've been updating the content, I used basic functions of all the current programs and found many bugs. Most of them were minor and some of them are major, but not the main features. I fixed so many things in the last couple of weeks (as well as adding new functions), so I can't track all the fixes/changes, but here's a list of few of them.

CasualConc:
- Context words function in Concord was broken, but not it should be working.
- Keyword grouping function was fixed
- Keyword grouping only worked when the search was for a group of words. Now keyword groups can be used in a phrase search
- Lemma in search word should work with wild card/phrase search now
- You can import a word list not created by CasualConc now. It accepts CSV or Tab-delimited file with words in one column and frequency in the other. This allows you to import a word list created by other program/script.

CasualTextractor:
- In PDF/Web/Document, most of changes should be undoable. I changed the function to draw text in text area.
- Batch process was not working, but it should now.

I also made minor changes to CaualTagger and CasualMecab, but I forgot what I did. Most of them are bug fixes.

If you downloaded any of the program and find bugs, they might have been fixed now. If not, please report them to me. You can add your comment to the post on this blog or email me or post on Discussion Board (Google Groups). If you have any good ideas/suggestions for any of the programs, I'd appreciate your feedback.

Tuesday, November 11, 2008

New experiment (UPDATE)

This is an update to the post, New experiment. I implemented a function to automatically update CasualConcData.ccdb file. So I dropped the beta beta version. The official beta version is 0.9.9.1. Please check the previous post for the details of the last update.

If you try this new function, please let me know what you think.

Sunday, November 9, 2008

Link Grammar

I found this syntactic parser (site) the other day and I also found there is a Ruby binding called Ruby LinkParser (site). The instruction on the Ruby LinkParser looked pretty simple and I thought I would be able to install it and start using Link Grammar from Ruby in no time.

Well, I was wrong. Maybe because the instruction was written for Unix/Linux users, it didn't work on Mac OS X Leopard. I spend half of the day to somehow manage to install it on my Mac. But the process is ugly. I don't recommend it to someone who is not so familiar with Mac OS X system. I hope the original authors could fix the problem and/or write a patch for the latest version of Link Grammar.

Anyway, I put the step-by-step instruction with a lot of screen captures on the CasualConc site. If you are interested, please check this page. If you know or figure out a better way to install them, please, please let me know. In fact, I'm not sure if it's working as it should.

In the future, I want to add a function to CasualTagger to do some simple syntactic parsing using Link Grammar/Ruby LinkParser. But first I need to figure out which functions to add and how. Yoichiro Hasebe wrote a program (port of phpSyntaxTree), RSyntaxTree, that draws a tree diagram from a syntactically parsed sentence (check this page), such as [S [NP RSyntaxTree][VP [V generates][NP multilingual syntax trees]]]. So I want to add a function to output this format (I'm not even sure if Ruby LinkParser does it or not).

By the way, for those who are not sure what syntactic parser is, here's the sample.

The origial sentence is:

Ruby is a dynamic, open source programming language.



If you try to install this, please let me know if it works for you.

Thursday, November 6, 2008

New experiment

I added a new experimental feature to CasualConc. It is an XML information tag handling, which is very limited. What I mean is if you have XML files or XML-formatted plain text files that have information as tag attributes or elements, you can filter the files with the information.

The current implementation can handle two types:

<header attr1="~" attr2="~"></header>

or

<header><attr1>~</attr1><attr2>~</attr2></header>

So if your files have:

<info date="11052008" title="Presidential Election">

or

<info>
<date>11052008</date>
<title>Presidential Election</title>
</info>

CasualConc can preselect the files based on your query. Check this page for more information.

Because of the other changes I made to SQLite database handling, CasualConc is incompatible with the corpora/databases you created in the Advanced File Handling Mode (it shuts down when it tries to read CasualConcData.ccdb file in the ~/Library/Application Support/CasualConc. If you have used a prior version, you need to delete/move/rename it. So this beta of beta is not linked to any pages on the site. If you are interested in testing this highly experimental beta-beta version, please go to this download page directly. (I implemented a function to automatically update the CasualConcData.ccdb file. Now this version is downloadable from the regular download page.) I would really appreciate if you could give me feedback especially on this new function. If you have any suggestion, please make it detailed. I personally haven't used XML files, so I'm not sure if this is useful. More detailed is your suggestion, more likely it will be added to CasualConc (no guarantee, though).

Apart from this beta-beta, I fixed one bug on collocation coloring in CasualConc regular beta version (sounds wierd...).