Clone wiki

biglog-parser-scala / Home


This is a simple project to show how scala actors work. The idea is to parse an apache logfile, tokenize it and store it somewhere.

We do not know the size of the records (lines) so we need to scan the file to split it into lines.

We need to do it concurrently because the file will be big and we need to get the tokenized data out of memory fast before we run out of it; so we should scan the file, split it into lines, tokenize them and store the records for further processing all at once ( cpu cores permitting ).

File -> scan -> split lines -> tokenize lines -> store