Efficient use of Java’s Scanner

So I’ve been debugging some code that I wrote quickly to parse a couple of hundred megabytes of log files. I wasn’t surprised to notice that the most expensive method call was to getNextLine() in java.util.Scanner, but I was surprised to see that hasMoreLines() was the second most expensive. It appears that there is no buffering of the line when it does its search of the stream, which just about doubles the processing time if you’re reading millions of lines. I can only assume the same problem is encountered with the other return types (getNextLong(), etc).

Luckily it’s an easy fix, changing this code:


Scanner s = new Scanner(inputFile);
while(s.hasMoreLines()){
    System.out.println(s.nextLine());
}
s.close();

to:


Scanner s = new Scanner(inputFile);
try{
    while(true){
        System.out.println(s.nextLine());
    }
}catch(NoSuchElementException e){
    s.close();
}

For those of you used to exception-based programming this will seem obvious, but I’ve posted it here for everyone else!

Adding packages to Perl on OS X Leopard

So I was having a bit of a problem. I’ve got Perl installed with MacPorts - probably a little superfluous now seeing as Leopard is shipped with 5.8.8, but still. I’ve been wondering for a while about getting extra packages installed, and expected it to be as painful as many other unix tools are for OS X. How I was proved wrong!

Basically all you need to do is download the package from CPAN, extract it and drop the folder into /System/Library/Perl/Extras/5.8.8 (for global addition of the package). The only caveat I found with this is that you need to make sure the naming is the same as you reference the package in Perl, so for example I wanted to use Statistics::Distributions, so I renamed the downloaded, extracted folder from Statistics-Distributions-1.01 to “Statistics” with Distributions.pm inside it, and it worked like a charm.

I’m guessing there will be some which need compiling and can prove to be a complete pain, but for now I’m impressed.