Sunday, December 28, 2008

A few minor changes

This would probably be the last update of CasualConc this year.

I made a few minor cosmetic changes and fixed a few minor bugs. The latest version is If you don't have any problem with, you don't need to update to this version. I just wanted to make the changes before I forget.

Happy new year!!

Wednesday, December 10, 2008

Bug fix and few final touch to CasualTranscriber

I found a bug when I was showing it to someone today. When I added a function to dynamically change menu items based on Preference settings (App Mode), I set some of them to turn off when they should be on. I think I fixed this. I also changed the time stamp insertion. I finally figured out how to override link clicking behavior (links open with a browser by default). So now you can click time stamp on the Editor to go to the time on the movie/sound clip. You can still select the time code and use shortcut (you don't need to use a mouse in this way). The latest version is

But today, my friend showed me there IS software just like CasualTranscriber... It is called InqScribe. Well, what's shocking was it looks almost the same as CasualTranscriber. I mean the layout, functions, etc. I haven't tried it (it's $99 with 30-day free trial, $39 for students), so I don't know how good it is, but it would probably better considering its price and how much time I spent on CasualTranscriber (less than three weeks). So if you are more interested in adding subtitles, you might want to take a look at it. Probably the only advantage of CasualTranscriber is the cost...

Anyway, if you try CasualTranscriber, I'd really like to hear from you. And because there aren't many users (prabably only a few), your feature request might be taken seriously unless it's too complicated for me to handle.

Sunday, December 7, 2008

CasualTranscriber Development is wrapping up

I worked on CasualTranscriber a little more and I finally found a major source of crash. It crashes less now (I hope). I also added a separate player in case someone might want to use controller for your movie playback (such as in class). I also found that OS X can handle MS Word format (only text), so I added a function to handle MS Word document.

I personally think this is a program to help transcription and adding subtitles is a secondary feature, but I looks like my friend uses this primarily for adding subtitles. So I spent a little more time to add subtitles in a different format (not as a text track).

I hope this program is useful for more people. Intended users are language teachers and conversation analysts.

Friday, December 5, 2008

Bug fixes to CasualTranscriber

Thanks to my friend who is testing CasualTranscriber, I was able to fix many bugs. The program is still not totally stable, but it sounds like it is usable. I implemented almost all the features I can think of (or I can manage) now and from now on I'll fix bugs and might add some features if I get feedback. The current version is 0.6, but it just means I've made some significant changes 6 times since I started working on this.

In any case, if you ever use it and find any bugs or think of something very cool, please let me know. I'm not sure what I do next, but I don't get much feedback, so I don't know what I should do with CasualConc. I want to make it handle 2-byte languages better and possible add parallel concordancing, but it takes time to think about how to implement them. I have a few more ideas about small programs, but maybe I have to get more serious about my own work...

Monday, December 1, 2008


I know there are not so many people (probably only few mostly I personally know) have tried CasualTranscriber. It's still very early in development, but I've basically rewritten the program from a window-based to a document-based application. Now it deals a rich text or plain text file just like a text editor or word processor. When you close a window, that means you close the document on it.

In addition to the original features (shortcut control of movie/sound clip and extraction of a selected part), you can add chapters, extract a frame image, add subtitles, use the application just as a player, etc. I can't list all the features, but they are all on the new CasualConc site under Utility Programs. Some of the features (esp. adding subtitles) are still experimental and the application itself is not as stable as I want it to be (so now it has an autosave function). This is partly because I'm still new to QTKit (QuickTime Took Kit) and not all QuickTime functions are not avaiable via QTKit. I'm using RubyCocoa, which is a bridge between Ruby, a scripting language, and Cocoa, Mac's application foramework, so there is another layer of issues there.

I want to make this program easy to use for teachers and researchers (to-be) who need to transcribe conversations, speech, movie clips, etc. for teaching/researching. And if you ever use this, I'd really appreciate your feedback.

Thursday, November 20, 2008

New Utility

I started another experiment and started to write a new utility program. It's named CasualTranscriber. It's a simply utility program to assist transcription of movie/sound files (text). I googled and found some good free ones, but because I wanted to learn a bit about QTKit (QuickTime Kit) with RubyCocoa, I wrote it. It's very simple at the moment and at very early stage of development. But if you are interested, please check it out and let me know what you think. If enough people are interested, I will develop it further. Here's the direct link.

As I wrote in the previous post, I started a new CasualConc site on Google Sites. And I want to introduce it here, though only English site is available at this moment (no Japanese site yet). All the program files are still on the old site, but the new site has much more information especially about utility programs (with screenshots). You can access the new site from this link. The CasualTranscriber page has much more information. So please check it out there.

As always, I'd appreciate any feedback on any of the program and also the sites. If you have any suggestion/comment/bug report, please leave comment on this blog, add new topic to the Discussion Board, or send email to casualconc (at)

Wednesday, November 12, 2008

Bug fixes

As you might have noticed, the current CasualConc site is hosted on Google Page Creator. But Google decided to stop this service and focus on Google Sites. So I've been transferring site contents to a new site on Google Sites. In the process, I've been updating the content and adding a little more information to some pages. So far, I've created the English site, but haven't started the Japanese site. I'm not sure how many people prefer to have Japanese page, but I will eventually create a Japanese site (I personally prefer to reading in Japanese).

While I've been updating the content, I used basic functions of all the current programs and found many bugs. Most of them were minor and some of them are major, but not the main features. I fixed so many things in the last couple of weeks (as well as adding new functions), so I can't track all the fixes/changes, but here's a list of few of them.

- Context words function in Concord was broken, but not it should be working.
- Keyword grouping function was fixed
- Keyword grouping only worked when the search was for a group of words. Now keyword groups can be used in a phrase search
- Lemma in search word should work with wild card/phrase search now
- You can import a word list not created by CasualConc now. It accepts CSV or Tab-delimited file with words in one column and frequency in the other. This allows you to import a word list created by other program/script.

- In PDF/Web/Document, most of changes should be undoable. I changed the function to draw text in text area.
- Batch process was not working, but it should now.

I also made minor changes to CaualTagger and CasualMecab, but I forgot what I did. Most of them are bug fixes.

If you downloaded any of the program and find bugs, they might have been fixed now. If not, please report them to me. You can add your comment to the post on this blog or email me or post on Discussion Board (Google Groups). If you have any good ideas/suggestions for any of the programs, I'd appreciate your feedback.

Tuesday, November 11, 2008

New experiment (UPDATE)

This is an update to the post, New experiment. I implemented a function to automatically update CasualConcData.ccdb file. So I dropped the beta beta version. The official beta version is Please check the previous post for the details of the last update.

If you try this new function, please let me know what you think.

Sunday, November 9, 2008

Link Grammar

I found this syntactic parser (site) the other day and I also found there is a Ruby binding called Ruby LinkParser (site). The instruction on the Ruby LinkParser looked pretty simple and I thought I would be able to install it and start using Link Grammar from Ruby in no time.

Well, I was wrong. Maybe because the instruction was written for Unix/Linux users, it didn't work on Mac OS X Leopard. I spend half of the day to somehow manage to install it on my Mac. But the process is ugly. I don't recommend it to someone who is not so familiar with Mac OS X system. I hope the original authors could fix the problem and/or write a patch for the latest version of Link Grammar.

Anyway, I put the step-by-step instruction with a lot of screen captures on the CasualConc site. If you are interested, please check this page. If you know or figure out a better way to install them, please, please let me know. In fact, I'm not sure if it's working as it should.

In the future, I want to add a function to CasualTagger to do some simple syntactic parsing using Link Grammar/Ruby LinkParser. But first I need to figure out which functions to add and how. Yoichiro Hasebe wrote a program (port of phpSyntaxTree), RSyntaxTree, that draws a tree diagram from a syntactically parsed sentence (check this page), such as [S [NP RSyntaxTree][VP [V generates][NP multilingual syntax trees]]]. So I want to add a function to output this format (I'm not even sure if Ruby LinkParser does it or not).

By the way, for those who are not sure what syntactic parser is, here's the sample.

The origial sentence is:

Ruby is a dynamic, open source programming language.

If you try to install this, please let me know if it works for you.

Thursday, November 6, 2008

New experiment

I added a new experimental feature to CasualConc. It is an XML information tag handling, which is very limited. What I mean is if you have XML files or XML-formatted plain text files that have information as tag attributes or elements, you can filter the files with the information.

The current implementation can handle two types:

<header attr1="~" attr2="~"></header>



So if your files have:

<info date="11052008" title="Presidential Election">


<title>Presidential Election</title>

CasualConc can preselect the files based on your query. Check this page for more information.

Because of the other changes I made to SQLite database handling, CasualConc is incompatible with the corpora/databases you created in the Advanced File Handling Mode (it shuts down when it tries to read CasualConcData.ccdb file in the ~/Library/Application Support/CasualConc. If you have used a prior version, you need to delete/move/rename it. So this beta of beta is not linked to any pages on the site. If you are interested in testing this highly experimental beta-beta version, please go to this download page directly. (I implemented a function to automatically update the CasualConcData.ccdb file. Now this version is downloadable from the regular download page.) I would really appreciate if you could give me feedback especially on this new function. If you have any suggestion, please make it detailed. I personally haven't used XML files, so I'm not sure if this is useful. More detailed is your suggestion, more likely it will be added to CasualConc (no guarantee, though).

Apart from this beta-beta, I fixed one bug on collocation coloring in CasualConc regular beta version (sounds wierd...).

Wednesday, October 22, 2008

CasualConc Update

I've been working on CasualConc and uploaded 0.9.9 to the site. Well, I'm not sure when to put 1.0, so I might go with

The most of the changes I implemented are internal and many bug fixes. I didn't really touch the core tools, so most of the work was done on file handling.

Here are some of the changes you might (or not) notice:

Plain Text File encoding
You can now set a default text encoding on the File view (no need to open Preferences). You can specify a default encoding before you open files, which applies to all the files you add to the file list table. But now you can change them on the table. This means you can select different text encoding for each file. I also added ISO Latin 1 and ISO Latin 2 to the encoding list.

Open/Save panel
This change probably would not make difference to most of the people. I just wanted to change it to Genie panel(?) because I learned how to.

Most frequent position for each context word is now colored in red.

Bug fixes
- Exporting/Saving results should work fine now, though I'm not sure how many people have ever tried to use this function. I also made changes to accommodate the changes in Collocation.
- Fixed crashes when you hit space (or may be with some other keys) on the blank table. You might have never done such a stupid thing, but I found this when I accidentally hit the space bar in Advanced Corpus File Handling mode.

Also I put a note on the site, but your preferences settings will be lost if you have used the previous versions. If you want to use it change the name of the preference file "" in your home -> Library -> Prefereces folder to "CasualConcApp.plist". Except for tag ignore settings, your preferences settings should be safe.

Along with this change, I also added ISO Latin 1 and ISO Latin 2 to the list of encodings (open/save) in CasualTextractor.

If you find any of these attractive or bothered by bugs, please try the latest version and let me know what you think about it. But reports are also welcome.

By the way, I haven't updated all the documentation yet. Some of them are quite old. I guess I have to find time and update them (or rearrange them). I read somewhere that Google is moving the content of Google Page Creatot to Google Sites. That might be a good time to update documentation.

Saturday, October 18, 2008

Utility Programs updates

I've been experimenting some Cocoa UIs and bindings, and adding features I learned to utility programs. I made minor changes to CasualTextractor and CasualMecab. I also made a little more changes and added some new functions to CasualTagger. Now I need to manually tag a lot of texts, so I'm trying to make it a tool to help manual tagging. I copied regular expression replace from CasualConc (hidden feature) and added simple tag coloring. Now it also has a simple word/tag count and kwic concordance of a single file.

I'm also working on CasualConc. As I wrote in the last post, I will probably clean up some old codes. Tag handling might take some time to implement because I have to think about how to handle tags in CasualConc. Any idea? What I'll probably do first is change/fix file handling elements. In utility programs, you can now change the character encoding of plain text files after you add them to the table. This will allow you to use text files with different encodings.

Another minor change is coloring in Collocation. Now the most frequent position for each context word will be colored in red. This looks working fine, so you will see this feature in the next update.

In addition to RubyCocoa programs, I wrote a simple javascript-based parallel concordancer, which I was asked to write. I based it on my old javascript-based concordancer, so not much scripting was involved. I did this because I'm thinking about writing a parallel concordancer for Mac, as I wrote on this blog before, so I wanted to know what are the most fundamental features for a parallel concordancer. I googled and based on some parallel concordancers out there, I wrote it. It only creates a table with matched texts based on the search. It also creates kwic results and you can select one to show the matching text. But what else is necessary for a parallel concordancer for Mac? If you have any suggestion, please leave your comment here or send me email or post on CasualConc Google Discussion Board. If you could give me enough information for me to figure out how to implement your requests, you will have a better chance to see them, though it also depends on my scripting skills.

Anyway, please check out the utility programs and let me hear what you think.

Monday, October 13, 2008

A few bug fixes

I made a few more bug fixes and some internal changes. The somewhat major but I fixed was aligning and coloring of text on Concord table and text view. If you use CasualConc only with English, you probably didn't see any problem, but if you deal with text with a lot of non-standard alphabet characters, the display was ugly. Now it's better (not perfect). There still is a problem with language that combine more than one characters to display one character on display. Other than that displaying text is less problematic.

The major internal change I made is using Shared User Defaults Controller to save Preference settings. This saved a lot of codes, but at the same time, this is not perfect. Somehow this doesn't remember that changes made by scripts, so for some text data, I have to use script to save the data properly. But I might have done something wrong, so if you find any bugs related to Preferences, please let me know.

I also made a major change to CasualTagger. CasualConc had hidden functions to help manual tagging, which I have never turned on officially. This was because I didn't have time to finalize/fix bugs, so I decided not to make it available. Now I took some of the features from it and added to CasualTagger. I haven't documented them, but I included a simple instruction in the application as a help file. If you are interested, please take a look at it. CasualTagger is on the main site under Utility Programs.

I also made more changed to IPATypist, which not a lot of people use. And I guess those people who have downloaded it might not read this, so they don't know if it's updated or not (though I'm not sure if they keep using it).

That's about it for now. I'm also thinking about adding tag handling features to CasualConc, but it doesn't look promising. I once wrote experimental scripts to handle some types of tags, but they don't work very well. Now if I want to seriously add this feature, I have to get rid of old ones first. It's not very easy... Also there are a lot of weird scripts in CasualConc because it includes some codes I wrote when I was just starting to learn Ruby. I guess I have to clean up old messes first before I add some significant new features.

In any case, if you use CasualConc and/or other utility programs, please let me know what you think. The current priority is adding tag handling features. East Asian language support might be dropped. It would be a separate program. Some people asked about parallel concordancer, so I might write a separate one for it, but I still don't have enough information to go ahead. If you'd like to see a parallel concordancer for Mac, please give me information. You can email me directly or make a comment on this blog or post on CasualConc Discussion Board. I need to know what are the most fundamental features and how they should be implemented.

Friday, October 10, 2008

IPATypist update

This is not related to corpus analysis, so I guess almost no one is interested, but as my memo, I write down what I did.

IPATypist is a very simple utility program to type IPA phonetic alphabets. With this update, I added a database function to it. Now phonetic transcriptions can be stored so you don't have to type them again. But this is mainly my experiment to use CoreData, an OS X framework to easily handle database type programs. I just started look at it, so I'm still not sure if I did this right or wrong, but it looks it's working.

If you happen to be one of rare people who are interested in this program, please check it out and let me know what you think.

Thursday, October 9, 2008

Utility Programs

I updated a couple of utility programs, including changing names, and added a new utility program to the main site.

Two of the utilities are TextExtractor and jparser. These two names are already used for program/module names, so I decided to change them. I use the same Casual~ namings for them and TextExtractor is now CasualTextractor and jparser is now CasualMecab. Well, it's obvious I didn't spend much time on this... Both utilities had a few bug fixes and minor feature changes, which are not obvious.

And a new addition is a POS tagger. I finally found a English POS tagging module for Ruby. The module is EngTagger by Yoichiro Hasebe, which is a Ruby port of Perl's Lingua::En::Tagger module. I simply added GUI and a function to process multiple files. You can also select a tag type (the default of EngTagger is xml format). I named it CasualTagger and you can download it from the CasualConc main site (here's the direct link to the page: English/Japanese). To use CasualTagger, you need to install EngTagger, but it's very easy from (single line of command). For more information about EngTagger (tag sets, etc.), check this page. As always, any feedback is welcome.

Now I need to seriously think about adding tag-handling feature to CasualConc. But how?? If you have any suggestion, leave your comment on this blog or email me (email address is on the main site).

Also a couple of people asked for a parallel concordancer (for Mac and with Javascript). But what's the most fundamental functions? Any suggestion/comment about parallel concordancer is also welcome.

Friday, September 19, 2008


I haven't had time to write any script at all for a while, but I've found some time to experiment on RubyCocoa. The stuff I'm working on won't change anything on the surface of CasualConc. The changes will be mostly internal and slow.

A very few feature requests I got are xml file handling and parallel concordancing. But I don't have any experience in these, so I need more information. I added two threads to Google Groups discussion board about these two. If people ever read this and give me information about these two, I might add them or write a separate program for them (parallel concordancer).

Another thing I'm considering about CasualConc is dropping East Asian Languages support. I don't hear from anybody who uses CasualConc for Japanese/Korean/Chinese, so I don't know if I really want to keep trying to accommodate these function on CasualConc. It would probably be easier for me to maintain if I separate East Asian Languages concordancer (eps. kwic) as another program. I'll think about this more, but if you have any suggestion, let me know.

Anyway, as a part of my experiment on RubyCocoa, I updated TextExtractor, an utility program to extract text data from verious text embedded files and to convert text encoding of plain text files to UTF-8. I'm not sure if you looked at the utility program section of the CasualConc site, I have a few utility programs that deal with text files. I combined two of them (PDF to Text and HTML to Text converters) and added a few extra functions. The first version (0.1) of TextExtractor had a function of jparser (a Japanese parsing program using MeCab), but it didn't run without MeCab and MeCab-Ruby. So I dropped this function.

Instead, I made that part to simply convert non-UTF-8 text files (.txt) to UTF-8 text files and MS Word, PDF, HTML, OpenOffice documents to UTF-8 text files or Rich Text Format files. All other parts (PDF to Text, Web file to Text, and batch process) can save files as RTF files. When you convert files to RTF files, you can either keep text/font information of the original files (fonts/font style/etc.) or throw away this info and save as a plain text on RTF file.

I also added basic instruction in English (not translated to Japanese yet). So if you are interested, please try it and let me know what you think.

EDIT: This program is renamed as CasualTextractor

Sunday, September 7, 2008

A few very minor changes

Well, obviously, I haven't done anything with CasualConc for the last three months. I finally announced this on Corpora List and some people got interested and started testing CasualConc. But I heard from only a few people. Still it's good to know someone uses it and likes it.

I made a few minor changes/bug fixes to CasualConc. The only one that's worth mentioning here is that now CasualConc remembers the files you selected in File Mode when you quit the program. The next time you start CasualConc, the files you selected last time should be on the file list. Now the version number is 0.9.8 beta.

As always, I'd like to know what you think about the program.

Sunday, June 8, 2008

Bug fixes

I finally got a bug report. Now I know at least a few more people are using CasualConc.

The bugs are related to the recent changes I made to Lemmatization and Collocation.

The bug related to lemmatization was that when lemmatization was activated without specifying a lemma file, CasualConc crashed. This was because CasualConc looked for a lemma file when it started or returned from the preferences and if the file was not found, it crashed.

The two bugs related to collocation were 1) it didn't run in file mode, and 2) search in concord didn't work when 'Treat Keywords as One Word' option is activated in preferences. These should be fixed and work properly now.

I would appreciate any report of bugs. And I'd like to know how you like CasualConc.

Friday, May 30, 2008

A minor change

I found a bug (sort of) in Concord a couple of days ago. It's a minor bug and this happens only when you use the database mode in Concord. Well, it's more of memory leak. I implemented a forced garbage collection when full text is displayed in the context view of Concord, but somehow memory is not released. So I changed the way to read the text from a database file. Now it should not keep using additional memory when you select a different concordance line to show full text.

I use the same technique to read data from a database file when CasualConc searches a string, but if I implemented the same change to the search function, it used more memory because the search returns more hits. What this means is if you search word(s)/phrase(s) in any of the tools many times, CasualConc keeps using memory. I haven't tested if it uses up all the available memory and starts using virtual memory or if Ruby starts GC when it uses up all the available physical memory. In any case, until I can find a way to solve this problem, you might want to quit CasualConc after a while and restart it.

Tuesday, May 20, 2008

A few more fixes again

I fixed a few more bugs this weekend that are mainly related to lemmatization and collocation statistics. I also added some more documentations to the main site (some in Japanese). The latest version is still 0.9.7 but the date is 05192008.

Now, most of the features I wanted to include in CasualConc is there and mostly functioning. I don't have time to improve Japanese kwic feature now, so that should wait until sometime in summer or fall. And unless I find or someone reports any major bugs, I will try not to spend too much time on this for a while. I don't know how many people actually downloaded CasualConc and are using it, but I guess there aren't many. If you happened to be one of them, I'd like to hear what you think about it.

Well, I might need to publicize this a bit more, so I might start trying to get more beta testers somewhere.

Sunday, May 18, 2008

Another bug fix and minor update

I highly doubt if anyone downloaded CasualConc recently, but anyway, I found a few bugs and also make some changes, which I wanted to for a while. Now the latest version is 0.9.7 beta.

First, I found a bug in Japanese Concordance, which I'm sure nobody has ever used. When I dropped the text only mode, I forgot to change it in Japanese concordance mode. Now it should be fixed. I also fixed some other bugs what relate to the recent feature changes.

The changes I made are mainly with Collocation. Now, if you search for multiple words or use wildcard search and multiple words are found, collocation information will be displayed for each keyword. This change affected statistics calculation, so I think I made necessary changes to it.

I also made a minor change to Export result function of Concord. Originally, an exported CSV file from Concord only include kwic results and file paths. Now it has an option to include context words (L5 - R5). To include them, go to Preferences -> Concord and check the box Include context words (L5 - R5) in CSV output.

As always, if you happened to find this blog or the main page, and tried CasualConc, any feedback (including bug reports) is welcome. Especially if you find it useful, I'd like to know.

Thursday, May 15, 2008

Another quick fix

I found a minor (maybe major to someone if anyone ever uses CasualConc) bug and fixed it today. This only affects you if you use Concord with non-plain text files as your corpus files. And this only happens in the paragraph mode (the default mode). Now you should be able to use other file types as your corpus files with Concord and in the paragraph mode.

I found this bug when I was testing .odt files. After the fix, I was able to use .odt files as corpus files, so this confirms CasualConc can read .odt files!

I don't know how many people are affected (I know not many people), but if you downloaded CasualConc in the last couple of weeks, please go to the site and download the latest version. It has the same version number (0.9.6), but different date (05142008).

And if you find any other bugs, please let me know. The email address is on the main site.

Tuesday, May 13, 2008

Some details of last update

As I mentioned in the last post, I added/activated a couple of new features on CasualConc. One is based on the lemmatizing function and the other is something with Concord.

The first, which is based on the lemmatizing function is keyword grouping or whatever name I will settle (it has a tentative label). What it does is first you prepare a text file (UTF-8) with the same format the lemmatizer accepts. The default is:

keyword -> word,word,word,...

The keyword is a grouping label, so if you want to group days of a week, it looks like:

week -> Monday,Tuesday,Wednesday,Thursday,Friday,Saturday,Sunday

Once you prepare as many groups you want to have, save the file as a plain text with UTF-8 encoding. Then, go to Preferences on CasualConc -> Lemma, and check Grouped Keywords. Next you select the file you just saved by clicking Select Grouping File button. Now everything is set.

If this works as intended, you should be able to use this function on Concord, Cluster, and Collocation/Cooccurrence. What you will do is add @@ at the beginning of your search word(s). So if you want to search all the days of a week, as specified above, you will @@week, then search. You should be able to search all the words in this group. Technically, you should be able to search multiple groups, but it is not fully tested and might not work, and I don't know what will happen if you combine this feature and wildcard search. I might change the behavior of this feature if I ever get any feedback.

Another somewhat major addition is which is not documented at this time is a function for Concord. You can now open a concordance result in a new window. This might be useful if you want to compare several concordance results. To use this function, search any word(s) in Concord, and then go to Menu -> Misc -> Open Concordance Result in New Window. This is experimental. I added this because I found a way to add multiple window function to a program (I just wanted to have something so that I remember how to do it). You should be able to resort the results even on a new window, just like on the main window. But be ware, if the concordance result is huge (like returned 10000 hits), using this might eats a lot of memory because CasualConc keeps all the info on memory. If you have at least 2GB of memory, this should be less of a problem, though.

Finally, I have something that is not related to CasualConc. I posted a couple of weeks ago that I wrote a simple utility program that helps typing IPA characters. I wrote a similar program(?) with Javascript and added to the IPATypist page. I highly doubt many people read this post and especially people who don't use Leopard, but this is written for those people. It should run on Tiger with Firefox, Safari and Camino. I haven't tested it on IE on Windows and I have no intention to support it, but it might work. It is also available for download, so if you are ever interested, you can download it and use it on your computer or put it on your course site or wherever you want to use it, though I can't guarantee it will work.

As always, if you ever use any of the programs, I'd apprecite your feedback. That will motivate me to improve them.

Tuesday, May 6, 2008

Minor update

Over the weekend, I fixed a few bugs and made a few changes to some of the existing functions of CasualConc. But these changes might have introduced another bugs... Now CasualConc is 0.9.6 (beta).

The bugs or more precisely, legacy features, that were fixed or updated were mostly on file handling. When I first started writing the program, I didn't know anything about RubyCocoa (or Cocoa). So when corpus files/database file were selected, only one file/folder was selectable. This was simply because of the original Ruby script. In that script, I simply specify a directory or a file to analyze in the script. And I simply added Cocoa interface to it. Eventually, I learned how to receive multiple file names as an array from the open panel, I made it available to some of the new features. Now you should be able to select multiple files/folders when you choose your corpus files/folders in File Mode. In Database Mode, only one file can be selected. If you need to select multiple database files, please use the advanced file handling mode.

Another bug fix was drag and drop of files. I'm not sure if I mentioned this feature in any of the documentation, but you can actually drag and drop files to the table in File View. So if you have a Finder window open with files you want to add, you can drag and drop them (or the folder that contains the files) to the table. This should work with files to analyze in File Mode and files to add to a database in Database Mode. If these don't work, please let me know.

Other minor changes are too minor and I'm almost certain no one has ever used it. But anyway, I dropped Text Only mode. So now all you need to do is check/uncheck the file types you want to use. You still need to specify text encoding if you use text files. This is because the auto-detection of text encoding in Objective-C is not usable. Related to this change is the addition of OpenDocuemnt Text (.odt) support. But because I've never used Open Office, I haven't tested it. I implemented this a while ago when I added others but didn't activate it because I don't use it. And now I decided to activate it. I simply use a bulit-in Objective-C function, it should work as other files do (no guarantee).

Oh, one, kind of, major fix is the lemma function. I implemented the lemmatization function at a very early stage. But I've made a lot of changes to most of the tools since then, so it seemed like I broke it. Now it's fixed and I also added a function to use lemma grouping in kwic search. I mean, if you turn on this feature and search a word that is on the lemma file you provide, you can search all the words grouped under the same lemma, though I'm not sure if this works as intended.

In addition to these mostly fixes, I added a couple of new features. One is based on the lemmatization function and the other is something very experimental. But this post is getting long, so I'll post them in the next few days when I have time.

As always, if you happen to find this blog or the CasualConc site, please let me know what you think. You can leave your comment on this blog or email me (the address is on the main site).

Saturday, May 3, 2008


This is a small utility program I wrote for an ESL instructor at my school (yes, this is just written for you, Janet!), but I made a few changes to it so that this can be also useful for other people.

Originally, she told me she was having trouble typing IPA phonetic alphabets in Unicode. There is a keyboard mapping to type phonetic alphabets, but it is cumbersome. So I simply put a lot of buttons to enter IPA characters. Because this was written specifically to serve her purpose, the characters are the ones used for English and some special ones that are used for the book she and her colleagues are working on.

What you can do with this utility is type phonetic alphabets by simply clicking buttons. Once you are done, copy/paste them to whatever the program you are working on. You can either go to Menu to copy or Command + C to copy the string, which keeps all the font information (font type/size). Or you can click Copy button, which only keeps the character information, so when you paste the string, whatever the font setting (type/size) on your document will be applied.

The latest version (0.3) supports key (button) mapping (if it functions as intended). Now, any character can be assigned to any of the buttons, so users/teachers of languages other than English could use it (I hope).

The system requirements are the same as all the program/utilities I wrote: Mac OS X 10.5.2 (Leopard) or later. I think this works on 10.5, but now all my machines are running 10.5.2, so I can't check the version prior to this (but at least I'm sure this won't run on Tiger). You also need Doulos SIL font, which can be downloaded freely from the SIL site. The link to their site is on the download page of this utility.

If you find any bugs or have any feature request, I will try to fix them/add them as much as I can (if they are minor). I don't have time to spend much time on this now (or I should say I should not spend time on this). But any feedback is welcome. Especially, I'd like to know if this helps someone.

Tuesday, April 29, 2008


Over the last couple of weeks, I've worked on the documentation of CasualConc. Now, it covers most of the basic functions. I also started more step-by-step instruction with a lot of images and named it Getting Started with CasualConc. So far, I have only finished basic file management and database creation along with the kwic concordance function, which I think will be the most frequently used function (I do).

Now, my only hope is someone will find the site or this blog and start using it. Somehow, I can't search the CasualConc main site on Google. It doesn't show up in the result. When I add a post to this blog, it shows up in the next 20 hours or so and disappears. Well, maybe I should add one post per day until some more people find this blog and CasualConc.

If you happen to find this blog, please try it (if you use Leopard) or tell your friends who uses Leopard to try it. I know it still has bugs and a lot of limitations, but I really want other people's opinions to improve it (it serves most of my current uses, so I don't have much motivation to make a lot of changes). Well, even if I hear from people, I might not be able work on it for a while, but at least it's good to hear esp. if people like it.

Monday, April 28, 2008

A quick fix

I'm almost certain nobody has downloaded CasualConc since my last post. But anyway, accidentally, I introduced a bug to database creation function. This was caused by implementing a new tag-deletion code, which I forgot to apply to database creation part. So if you find the database creation function does not work properly (this crashes CasualConc), please go to the site and download the latest beta.

If you find any other bugs, please report it to me. The email address is on the main site. Or you can leave a comment on this blog.


Sunday, April 27, 2008

A few more fixes

I found a few minor bugs which I introduced with the last changes, so fixed them. And I also found that the default font of CasualConc, Courier, is not monospace in Greek, which is the language my very first user tests (I guess) on CasualConc, so I added a function to select Courier or Courier New, which is monospace in Greek.

I didn't mention this in the last post, but I also made some changes to the codes of Concord, which only improved speed about 2-3%.

Now I hope more people find this blog or the main site and test CasualConc. So if you happen to find this blog or the main site and you know someone who uses Leopard and is interested in corpus analysis, please tell him/her to test CasualConc, even if you don't use Mac OS X Leopard. If you do, please try it!!

Saturday, April 26, 2008

A minor update

As I reported a couple of days ago, at least a few people around the world are testing CasualConc and I've already got a report of a bug... This very minor update is partly based on the report, which I don't really know the source of, and a minor change to a setting.

What I found was an inconsistency of handling special characters, such as curly quotes or curly apostrophes. These look like a single byte character on web pages or Word documents, but in fact two-byte characters in Unicode (UTF-8), so CasualConc replace them with a single byte quote or apostrophe. Recently I added .doc and .pdf support, and these documents often contain their own special characters (arrows, etc.). I only replaced these in some parts and not others, which caused inconsistency. Now I think I applied the same rule to all the tools, but I'm not sure.

Another change is drop of ASCII mode in concordance. In the Concordance Preferences, CasualConc has 4 ways to handle texts. Originally the two European Language supports are ASCII and with Acccented Characters. The former assumed the corpus files do not contain any multi-byte characters (in UTF-8). The latter only assumed a few accented (multi-byte) characters in a context (the context words shown in the concordance result table). But then I realized after the very first person downloaded CasualConc that he uses Greek, which, I think, uses full of multi-byte characters in UTF-8. So the new two modes for European Languages are A and B. A is the same as the previous with Accented Characters mode and B is for full multi-byte character languages, but still assuming not many 3-byte characters used in East Asian Languages. If the text contains many 3-byte characters like East Asian Language characters, like Japanese, which are 2-byte characters on the screen but processed as 3-byte characters in UTF-8, CasualConc might not be able to display concordance result or full context view properly. If there are any languages that have full of 2- and 3-byte characters, let me know. I'll see what I can do.

By the way, I decided to add 'Getting Started' section on the CasualConc site. The current 'How to use' is more like a manual or lists of functions CasualConc has, so it's not really how-to. The site only has basic file handling or 'how to select files for your analysis' type entry. I'll try to add more when I can find time.

Anyway, the current version is 0.9.4. If it is up to 0.9.9 and still not ready for version 1, I might go for, but if enough people use it and does not have major problem, I might put it as Version 1.

Thursday, April 24, 2008

Old stuff

Now I learned I can add an html page with javascript to the Google Page Creator site by simply uploading it as a file and link to it, I added an old javascript-based concordancer/word counter to the CasualConc site. This is probably useless for people and I'm not sure if I need it on the site, but I just wanted to keep it somewhere and because this old script is the basis of CasualConc, I think it's the right place (for me).

I wrote this script about 2 years ago when I was playing with javascript. At that time, I wanted to learn javascript, which I just started to learn a few months before that. I only knew MS-BASIC before that. When I started to play with javascript, I figured the best way to learn it is to write something with it. I first wrote s few scripts for my colleague at the work to save a repetitive task. Then, I wanted to do something for my self. I always wanted to do something with corpus linguistics. I found a few sites that did it with javascript and many scripts in Perl. With trial and error, and a lot of revisions, this javascript page was written. The page says it's version 6, but the script version is 61 (its on the file name of the script file). But then I learned limitations of javascript as a tool for corpus analysis. Then I tried Perl because that seemed to be what everyone used (and a lot of people are using it for text analysis), but somehow, it didn't appeal to me (or I wasn't/isn't smart enough to learn it). Then a year later, I used Ruby for something at work and somehow I liked it (still like it). I didn't know Python, which I learned when I was learning Ruby. Another big plus was that because Ruby was originally and is still developed mainly by Japanese people, I found a lot of documents in Japanese. This and the inclusion of RubyCocoa in Leopard is why CasualConc exists now. I think I wrote something like this in the very first post on this blog, but anyway, it's fun to use Ruby though my scripts are still primitive. I hope I can learn more about Ruby and improve CasualConc. What I want is time, but now I need to spend more time on other more important stuff...

Wednesday, April 23, 2008

CasualConc launched!!

I finally found someone who got interested in using CasualConc!!. I was just surfing the web looking for info on concordancing on Mac. Though I'm developing CasualConc, if I can find better more flexible concordancer for Mac, I'm happy to use it. The only problem will be I can't make the changes I want. Anyway, I found a blog that was describing poor concordance software situation on Mac, so I posted a comment and he replied to it and wrote he downloaded CasualConc. He wrote he would post the impression of CasualConc on his blog, so I'm really looking forward to it and at the same time I'm a bit nervous. I think CasualConc at its beta state works ok for my casual use right now (mostly searching for collocation of words I want to use in my paper). With database mode, it's fast enough to use regularly. And because I wrote the program, I know how to use it, but I'm wondering how easy or difficult CasualConc is for others. I've been adding contents to the documentation. But at some point, I might need to work on step-by-step instruction of how to use it. Well, this will only happen if more people are interested and start using CasualConc.

If you ever find CasualConc and use it, any comments are welcome!

Monday, April 21, 2008

Japanese Support

This would probably the last update for a while. While I was stuck with the idea for my dissertation, I simply spend some time here and there for the last few days to make minor fixes and feature enhancements to CasualConc. As I posted in the last couple of days, I finally made the download page available to public, though I'm not sure how many will find it, and added support for several different character encodings and file types. Finally, I added very limited Japanese support.

Now CasualConc can read Japanese (and possibly other East Asian Languages) files in two formats. One is a plain format without any space in between words. The other is wakachi-gaki, which has 1-byte space in between words. Wakachi-gaki files can be created with jparser unitility program. To analyze Japanese texts, a proper mode should be selected in the preference. Select Japanese (plain) under Concord options in the preference for the former and Japanese (wakachi) for the latter. If a proper mode is not selected, CasualConc cannot search words/characters. Wildcard search is implemented, but not tested thoroughly. Because of the way wakachi-gaki is written, 1-byte space should be inserted between words in phrase search. Because this is also experimental, CasualConc might crush when you try to analyze Japanese text. Japanese is only available for Text Mode. Once features are set, I will add database file support.

If you happen to find this blog or CasualConc page and are willing to try, please do so and let me hear what you think.

Sunday, April 20, 2008

CasualConc open to public

Well, I finally decided to make CasualConc public. This just means I added a link to the download page, which was already active, to the home page. I highly doubt anyone is visiting the CasualConc site, so this doesn't make much difference, but I'm hoping somehow someone might find the page and try it. When I googled, it didn't come up, so the only way to find the page is from a bbs (or usergroup?) post I wrote while ago (which I mistakingly posted multiple times because I thought I was able to edit my post, but it turned out I posted multiple times...) or from a link on my personal schedule page at my work. Or possibly, from this blog, if this can be googlable.

Anyway, because I don't get any feedback on existing features, I decided to work on something not currently implemented: Japanese (or Asian Languages) support. This is going to be highly experimental and I don't have much time now, so I can't tell when I will release it. So far, I can display kwic results of Japanese text in plain format (no space) and wakachi-gaki format (space-separated). The former can be sorted by L5-R5 context characters and the latter can be sorted by L5-R5 context words (or whatever the separated units are). In the future (only for Japanese), I want to include MeCab (which you need to install following the instruction on the CasualConc page) to process plain texts, but this won't happen near future.

If you ever find this blog and use Mac OS X Leopard and are interested in corpus analysis, check CasualConc and let me know what you think. The link to the CasualConc site is on the right or click this link.

Saturday, April 19, 2008

Another file format support update in CasualConc

After I updated CasualConc last night, I realized I could add html and WebKit Webarchive support. So I added these two to supported file format. Now it can read various files that contain text. But the process will be slower than plain text files. So if speed is important, convert the files to plain text. If you want faster search, then create a database file from plain text files. I will try to write a utility program to convert CasualConc supported files to UTF-8 plain text files.

Well, it's kind of sad to keep writing blogs knowing nobody is reading...

Friday, April 18, 2008

CasualConc update

With recent discoveries of file handling on Objective-C side (or RubyCocoa side?), I decided to add a new feature to CasualConc. Originally, CasualConc was able to handle UTF-8 or ASCII encoded files because I used Ruby's file handling method. Now I switched to Objective-C methods, so it can handle a few more encodings. Added encodings are UTF-16, Windows Latin 1, Windows Latin 2, Mac Roman, Shift-JIS, EUC-JP, and ISO-2202 JP. The last three encodings are all for Japanese. These are limited to the ones Objective-C can handle by default. I wasn't able to find Chinese or Korean encoding settings, so they are not included. But CasualConc cannot handle 2-byte character properly (in Concordance), so this shouldn't be an issue. I haven't really tested all the encodings, so if someone happens to find this blog and would like to try, let me know. I might add a link to download page to the CasualConc site soon (hopefully).

Also experimentally implemented is support of other file formats. This still returns error from time to time. Personally, all my corpus files are in text, so this is not for myself, but I just thought someone might be interested. The problem is no one is checking this blog or CasualConc site, so I highly doubt anyone even uses this function. By the way the added file formats are .doc, .docx., .rtf, .rtfd, and .pdf.

Well, I really hope someone would try to use CasualConc, though...

Sunday, April 13, 2008


This is another utility program I wrote in Ruby + RubyCocoa. What you can do with this program is very simple: extract text from a web page. In fact, I'm not sure how useful this is, but I just wanted to experiment. This is based on PtoTconvUtil, a PDF to text converter utility program. After I wrote it, I wondered how I could extract text from a web page, so I experimented a while, but couldn't figure it out. But last night, when I was not able to think about my paper, I found a clue at a web site, and then spent 15-20 min. to figure out how.

This program still has many issues because the browser part is simply made of Cocoa binding, which means no scripting. I simply wrote scripts to extract text and save it as a text file. But thanks to built-in Cocoa functions, it recognizes a web address in a text box (though you always need to type "http://"), reload, forward and back buttons work, and accepts Safari Webarchive file and HTML file by drag&drop. And I also found that this program can extract text from a PDF file which is displayed on a browser with a plug-in. So if you know a web address of a text-embedded PDF file, you can show it on a browser box and extract text. Yes, this is very good, but because I just use built-in functions, it's not flexible. I want to add a function to read bookmark from other browsers, so you don't have to type an address everytime. It might be easier to read Safari Bookmark, so I might try it first, though I'm not sure when that will happen.

Anyway, this program helps you compare the original and the extracted text on one window. So if you build a corpus from web pages, you can either extract the entire text or simply copy and paste a part of it.

So again, if you somehow found this page and is reading this, AND if you use Leopard, try it and let me know what you think.

EDIT: This program is discontinued and integrated into CasualTextractor, which is available on CasualConc site under Utility Programs.

Saturday, April 12, 2008


is the name I gave to a utility program that is based on MeCab. What this program does is POS/morphological analysis of Japanese text. What the program does at this moment is simply produce MeCab output. Choices are MeCab output, Chasen-like output, wakachi-gaki (words with spaces in between), and yomi (in katakana). The output can be saved as a text file. I want to add other output formats, but probably not in the near future. This program can also handle batch process although I haven't tested it extensively. The output file is encoded in UTF-8, mainly because that's what CasualConc can handle. I want to add Japanese concordancing feature to CasualConc in the future. If anyone ever finds this blog and is interested, please go to CasualConc site and download it. By the way, this program requires MeCab and MeCab-Ruby. The instruction to install these are also at CaualConc site. The installation is not simple (you need to use Terminal and command line to install), but the instruction is step-by-step. I hope anyone can understand it. As always, this is a Leopard only program and free.

Friday, April 11, 2008


I finally found a way to successfully install MeCab (Japanese parser) and MeCab-Ruby, Ruby binding for MeCab on Leopard. I added this page to the CasualConc web site. It's only in Japanese at this moment because I'm not sure how many people actually check the site and how many of very limited visitors are interested in installing MeCab-Ruby on their Leopard machine. If anyone is interested, I can translate the page into English, but probably there are many better sites somewhere.

But now that I installed it, I might add Japanese concordancing features to CasualConc, if I ever have time. At least, I can try it now. Also if anyone can understand how to install MeCab-Ruby on their computer, I might add parcing feature (Japanese) to CasualConc, assuming people are willing to install it on their own. But I'll probably first work on GUI interface of MeCab-Ruby to create wakachi-gaki files or syntactically parsed files. But when do I have time???

Friday, March 14, 2008

Garbage Collection again

After some experiments, some of my efforts paid off, but not all. Then, I realized it was not just Ruby that used memory. Becaue it's written in RubyCocoa, Cocoa or Objective-C part should use some memory. I believe Objective-C 2.0 has GC, but OS X uses as much memory it has and manage it. I might be wrong, but using more memory itself might not be that bad.

Concord, Cluster, Collocation might be usable with modest amount of memory, but Word Count (n-gram) requires a lot of memory. This is because it creates a huge array (all create arrays, though). I know my current implementation is not ideal, but maybe I have to improve Word Count first. When I was testing the original Ruby scripts, I only used smaller corpora (far less than 100 mil.). Now I need to figure out a way to reduce memory usage, but how? Does anyone have good idea? My implementation is to use hash to count, just as any basic Ruby book shows. But I tweaked it a bit to increase processing speed.

Anyway, this is partly why I put CasualConc can handle 1 mil. corpus at reasonable speed. Well, I need time.

Wednesday, March 12, 2008

Garbage Collection

I use CasualConc regularly to look up how certain words are used in a context. When I was using it, I realized CasualConc is a memory hog. I knew Word List, espcially when used for n-gram list, needs a lot of memory to process because it keeps counting new ones while it stores counted ones (not exactly, though). I knew Ruby has garbage collection built in, but it seemed like it wasn't working when I wated it to work (maybe because there still was a lot of unused memory). So I decided to force GC to start at some points (GC.start).

But when?

I've been trying several differnt points per each tool and associated method and monitor the differences. But because I've never seriously studied programming (I'm not and have never been in computer science), I don't think I understand how GC works (or in fact, I'm still not sure what exactly OO language entails. If you are breave enough to take a look at the Ruby/RubyCocoa source code of CasualConc, you can see my scripts are not written in Ruby way. I hope I have some time to learn to program a little more seriously someday, but for now, CasualConc works ok (at least for me).

Anyway, I'm not sure if someone ever reads this entry or any entry on this blog, but I'll try to keep my record on this. I want to add some memos on Ruby/RubyCocoa codes on this blog if I can.

Sunday, March 9, 2008

PDF to Text converter

In the last post, I wrote I found a way to extract embedded text from PDF files. I wanted to do something with it before I forget, so I wrote a simple utility program in Ruby+RubyCocoa and posted it to CasualConc site. The system requirement is Mac with Leopard. I named it simply PDFtoTextConverter. What it does is open a PDF and show it's embedded text in the text box on the same window. The extracted text can be saved as .txt file. It also has a batch process mode. You can add PDF files to the list and select a folder to save the text as .txt or save .txt file to the same folder where the origial PDF files are stored. If you are interested, please try it. You can go to CasualConc site by following the link on the right.

EDIT: This program is discontinued and integrated into CasualTextractor which is available on the CasualConc Main site under Utility Programs.

Friday, March 7, 2008


I finally found a way to extract text from text-embedded PDF files in RubyCocoa. I personally don't care about this much, but I guess this might be useful for some people. The problem with handling PDF text is not the extracting part. I mean, the real issues with implementing it to CasualConc are:

1. each line of text is separated by a line feed character LF (\n or \r\n?)
2. page headers/footers, etc. that are not the main text are also included
3. embedded text often includes extra spaces, garbled characters (often with ligatures), etc.

1 is probably the main issue. Currently, the basic unit of analysis in CasualConc is paragraph, which means text separated by LF characters. So it cannot handle text files that separate each line with LF characters such as Brown Corpus files. This require some coding (means not just adding a few lines) and I can't find time to do it now. I'll try to implement this feature in the future, but I don't know when.

2 and 3 cannot be avoided, I guess. So I might try to add a feature to extract text from PDF files within CasualConc, but this also requires certain amount of time.

But at least I know how to extract text from PDF files. So the feature will be included in a future version of CasualConc. If many people are interested, I might prioritize this (but probably won't happen at least until Summer).

Thursday, March 6, 2008

Google Search

I've been adding documentation to CasualConc site, although I haven't yet added a download page. Now it has a page for Concordance and Word Cluster along with Basic File Handling.

But now I'm wondering how Google works. I mean the Google Page Creator Help says the page created by it "can be crawled by Google within a few hours of publication". Well, it says "can be", so the actual time might be longer than a few hours. In fact, the CasualConc site was searchable on Google a couple of days after I published it. BUT now it's not on the search result. It disappeared!!

Maybe I should tell my friends to check this first...

Monday, March 3, 2008


I started this blog to keep track of what I do for CasualConc, experimental concordancing software for Mac OS X 10.5 Leopard (and possibly later version of OS X).

I started to learn a scripting language called Ruby, which is similar to Perl or Python, last summer. The main reason I chose Ruby was that there are many documentations in Japanese. I don't know if I made a right decision, but at least I tried Perl and Ruby but I like Ruby better for no particular reason (Perl simply didn't appeal to me when I tried). Another reason was that I read somewhere that Apple decided to include software (?) that bridges Ruby and Cocoa, Mac OS X's GUI framework (?) in Leopard. It's called RubyCocoa and it allows users to add Mac GUI to Ruby scripts (btw, there's a similar one for Python). Isn't this cool?

At first, I used Ruby for my work (I'm working as Instructional Technology Consultant at my school), but later decided to learn it more seriously. I'm interested in corpus linguistics and want to do some corpus-based/driven research, so I decided to write some scripts for basic corpus analyses. When OS X 10.5 Leopard came out, I had a few simple scripts for kwic, word count, etc., so I tried to add GUI to them. It wasn't very easy because there isn't much documentation for RubyCocoa. So I had to learn both Ruby and Cocoa and combine them to make GUI work.

Now, I have added some more features to kwic and word count and named it CasualConc. It is Mac GUI based software written in Ruby+RubyCocoa. Because the developing environment is OS X 10.5 Leopard, it only runs on Leopard. There might be a way to make it run on Tiger, but I don't want to spend time on it simply because I don't have time (and I don't have expertise). The current version is 0.9 and still in beta (well, beta simply means I call it so). I don't have much time to make a lot of changes now. From now on, I try to fix major bugs and write up some documents. And now I want to have someone to test it.

There is no guarantee that this works for you, but if you are interested, I'm happy to have you as a beta tester. Here's basic info:

System requirement: a Mac with a lot of memory (at least 1GB) and that runs Mac OS X 10.5 Leopard (Universal, well, this is mostly written in Ruby...), optimized for screen at least 1280px wide (13.3 inch or larger on notebook or 17 inch or larger on desktop LCD)
Acceptable file format: text files (.txt) encoded in ASCII or UTF-8 (Ruby is not good at handling character encodings)
Acceptable languages: any single-byte character language (double-byte character languages (East Asian languages) can be analyzed except for kwic concordancing as long as words are separated by single-byte space)
Target User: Mac users who don't want to start up Windows machine, switch to BootCamp, or run Virtual PC/Parallels/VM Ware for simple concordancing for preliminary analysis, preparing teaching materials, learning, etc. (CasualConc is probably not good enough as your primary research tool)

I use CasualConc on my Mac mini (1.86GHz Core 2 Duo) and have used it on G4 (1.5GHz) machine. It works fine for me, but with faster CPU and more memory, performance is better. With 1 million corpus, it works at reasonable speed (not as fast as WordSmith Tools). With a corpus larger than that, well, you can try.

If you are interested, check out CasualConc site. Documentation is not complete (far from it), so if you have never used any concordancer, you might find it difficult to use. But if you have, you can probably use most of the basic features.

By the way, this is freeware.