Efficient use of Java’s Scanner

So I’ve been debugging some code that I wrote quickly to parse a couple of hundred megabytes of log files. I wasn’t surprised to notice that the most expensive method call was to getNextLine() in java.util.Scanner, but I was surprised to see that hasMoreLines() was the second most expensive. It appears that there is no buffering of the line when it does its search of the stream, which just about doubles the processing time if you’re reading millions of lines. I can only assume the same problem is encountered with the other return types (getNextLong(), etc).

Luckily it’s an easy fix, changing this code:


Scanner s = new Scanner(inputFile);
while(s.hasMoreLines()){
    System.out.println(s.nextLine());
}
s.close();

to:


Scanner s = new Scanner(inputFile);
try{
    while(true){
        System.out.println(s.nextLine());
    }
}catch(NoSuchElementException e){
    s.close();
}

For those of you used to exception-based programming this will seem obvious, but I’ve posted it here for everyone else!

Leave a Reply