lundi 20 avril 2015

Apache Weblog Parsing using regex. Which is better, case class or a Java like class?

I am writing a general scala class which can parse the apache weblog files. So far the solution I have is to use group regex to match the different parts of the log string.To illustrate each line of the incoming logs gives something like the string below

25.198.250.35 - - [2014-07-19T16:05:33Z] "GET / HTTP/1.1" 404 1081 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"

class HttpLogStringParser(logLine: String) {
  // Regex Pattern matching the logLine
  val pattern = """^([\d.]+) (\S+) (\S+) \[(.*)\] \"(.+?)\" (\d{3}) (\d+) \"(\S+)\" \"([^\"]+)\"$""".r
  val matched = pattern.findFirstMatchIn(logLine)

  def getIP: String = {
    val IP = matched match {
      case Some(m) => m.group(1)
      case _ => None
    }
    IP.toString
  }

  def getTimeStamp: String = {
    val timeStamp = matched match {
      case Some(m) => m.group(4)
      case _ => None
    }
    timeStamp.toString
  }

  def getRequestPage: String = {
    val requestPage = matched match {
      case Some(m) => m.group(5)
      case _ => None
    }
    requestPage.toString
  }

  def getStatusCode: String = {
    val statusCode = matched match {
      case Some(m) => m.group(6)
      case _ => None
    }
    statusCode.toString
  }
}

calling these methods should give me IP, date, timestamp or status code. Is this the best way to do it. I have also tried pattern matching on case class but that just gives me match boolean. Am I getting it completely wrong. what would be the best way to get the values I need from the input log string?

Aucun commentaire:

Enregistrer un commentaire