Commits

Miki Tebeka committed 3ab6002

Parsing log

Comments (0)

Files changed (2)

 (defproject clj2010 "1.0.0-SNAPSHOT"
   :description "FIXME: write"
+  :main clj2010
   :dependencies [[org.clojure/clojure "1.2.0"]
-                 [org.clojure/clojure-contrib "1.2.0"]])
+                 [org.clojure/clojure-contrib "1.2.0"]
+                 [clj-time "0.3.0-SNAPSHOT"]
+                 [apricot-soup "0.0.2-SNAPSHOT"]
+                 [incanter "1.2.3"]])
-(ns clj2010)
+(ns clj2010
+  (:require [apricot-soup :as soup])
+  (:use [clojure.contrib.string :only (trim lower-case)]))
+
+; "chouser: " -> "chouser"
+(defn fix-user 
+  "\"chouser: \" -> \"chouser\""
+  [user]
+  (trim (apply str (butlast user))))
+
+(defn tokenize 
+  "poor man's tokenizer"
+  [sentence]
+  (map lower-case (re-seq #"[a-zA-Z0-9'_-]+" sentence)))
+
+(defn parse-p [previous-log p]
+  (let [[match time user text] (re-find #"^(\d+:\d+)(.+:)?(.*)" (soup/text p))]
+    { :time time
+      :tokens (tokenize text)
+      :user (if user (fix-user user) (:user previous-log))}))
+
+(defn process-logfile [logfile]
+  (rest (reductions parse-p (cons nil (soup/$ (slurp logfile) p)))))
+
+(defn -main []
+  (dorun (map println (take 10 (process-logfile "logs/2010-01-01.html")))))
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.