After some experiments, some of my efforts paid off, but not all. Then, I realized it was not just Ruby that used memory. Becaue it's written in RubyCocoa, Cocoa or Objective-C part should use some memory. I believe Objective-C 2.0 has GC, but OS X uses as much memory it has and manage it. I might be wrong, but using more memory itself might not be that bad.
Concord, Cluster, Collocation might be usable with modest amount of memory, but Word Count (n-gram) requires a lot of memory. This is because it creates a huge array (all create arrays, though). I know my current implementation is not ideal, but maybe I have to improve Word Count first. When I was testing the original Ruby scripts, I only used smaller corpora (far less than 100 mil.). Now I need to figure out a way to reduce memory usage, but how? Does anyone have good idea? My implementation is to use hash to count, just as any basic Ruby book shows. But I tweaked it a bit to increase processing speed.
Anyway, this is partly why I put CasualConc can handle 1 mil. corpus at reasonable speed. Well, I need time.
No comments:
Post a Comment