Skip to content

(WIP) An example of testing applications which use DataFrame (HiveContext)

License

Notifications You must be signed in to change notification settings

dobachi/SparkDataFrameTestExample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Example of testing DataFrame API and SQL queries

Highlights

  • Helper method creates test data
  • Separated utilities to calculate contents of DataFrame
  • Separated tests for each utility

Requirement

  • SBT or Activator are installed. I use activator command in this example.

Libraries included

  • Spark 1.6.2
  • Scala 2.11.6
  • ScalaTest 2.2.4
  • ScalaCheck 1.12.2

Getting started

Run application

$ activator run

Test application

$ activator test

Generate Uber jar

This example includes configuration of sbt-assembly, so you can run 'assembly'

$ activator assembly

Static Analyzer

The following static analyzers are included in build.sbt

Usage: automatically runs during Compilation and evaluation in console

Usage: automatically runs during Compilation

Open target/scala-2.11/scapegoat.xml or target/scala-2.11/scapegoat.html

Coding Style Checker

ScalaStyle

Usage: sbt scalastyle

Open target/scalastyle-result.xml

Check level are all "warn", change to "error" if you want to reject code changes when integrated with CI tools.

Issues

fork limitation in sbt console

This issue is originally mentioned in Sample Project for Spark 1.3.0 with Scala 2.11.6, a sample Spark application template.

Currently Test and run will fork a JVM. The reason it's necessary is that SBT's classloader doesn't work well with Spark and Spark shell.

However sbt console does not recognize fork key right now. It might throw ScalaReflectionException, etc.

Author

Application template

About

(WIP) An example of testing applications which use DataFrame (HiveContext)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published