- Install sbt. (scala build tool)
- Install apache-spark.
- Go to the unzipped apache-spark directory and in command line run
sbt assembly
(this takes a while, one may have to increase the memory allocated to run this in config file)
- Clone some git project
git clone https://github.com/apache/groovy
- Save the log into a text file
git log > C:\\temp\\log.txt
- Launch spark terminal and execute :
scala> val file = sc.textFile("C:\\temp\\log.txt")
file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[21] at textFile at <console>:27
scala> val authorLines = file.filter(line => line.contains("Author"))
authorLines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[22] at filter at <console>:29
scala> var maxAuthorTuple = authorLines.countByValue().maxBy(_._2)
maxAuthorTuple: (String, Long) = (Author: Paul King <paulk@asert.com.au>,2991)
- Verify that maxAuthorTuple has the author who made maximum commits in that branch with the number of commits.
- Install apache-spark.
- Go to the unzipped apache-spark directory and in command line run
sbt assembly
(this takes a while, one may have to increase the memory allocated to run this in config file)
- Clone some git project
git clone https://github.com/apache/groovy
- Save the log into a text file
git log > C:\\temp\\log.txt
- Launch spark terminal and execute :
scala> val file = sc.textFile("C:\\temp\\log.txt")
file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[21] at textFile at <console>:27
scala> val authorLines = file.filter(line => line.contains("Author"))
authorLines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[22] at filter at <console>:29
scala> var maxAuthorTuple = authorLines.countByValue().maxBy(_._2)
maxAuthorTuple: (String, Long) = (Author: Paul King <paulk@asert.com.au>,2991)
- Verify that maxAuthorTuple has the author who made maximum commits in that branch with the number of commits.