It was a sunny day. I started to setup my Spark Evironment. After reading a lot of online tutorials about how to config Spark1.X on Intellij Idea, I found it would be very easy. But something unexpected may happen any time. I still met some troubles as the Spark2.X has some differences from Spark1.X. That is why I want to write this blog to specify how to setup Spark2.X on Intellij Idea. There are five elements required:
- Ubuntu OS
- JAVA
- Spark 2.x (in my case, spark-2.0.2-bin-hadoop2.7)
- Scala
- Intellij Idea
Ok, let us go!
Install Java
The first step is to ensure that the Ubuntu has the Java JDK. After downloading JDK, we need to decompress the “.tar.gz” to the directory we want to install it. For exmaple: /usr/local/lib/jvm/. Then add the JAVA_HOME to our system. Open Bash, type:
sudo gedit /etc/profile
Add following path to the end of profile:
#JAVA path
export JAVA_HOME=/usr/local/lib/jvm/java
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
Sava and return to the Bash to check whether Java is installed to OS.
java -version
Install Scala
Although Spark has JAVA and Python API, its kernel language is Scala. So we also need to setup scala to our OS. We can download Scala from: http://www.scala-lang.org/download. It is very very very important to choose right version, becuase Spark does not support all scala version. In my case, I use scala 2.12.X version in the beginning. Then I run Spark code with error. So I change to scala2.11.x. Similarly, decompress scala-2.11.X.tar to where you want install.
Setup Spark2.X on Intellij Idea
Next, download Spark from http://spark.apache.org/. Also decompress it and remember its path. Open Intellij Idea, select “File-Settings-Plugins”. Search “scala” in “Browse Repositories” to find the scala plugin and install it.
Now, we can select “File-Create-Project” to create a Scala project. Next, we have three steps in “File-Project Structure”.
- Set JDK Path to Project tab.
- Set Scala Path to Global Libraries tab
- Set Spark Path to Libraries tab
It is noteworthy that comparing with Spark1.X, the Spark2.x dose not has “/lib/spark-assembly-XXXXX.jar”. Instead, there is a “jars” directory. So in the final step, what we need to do is to add the path of “jars” to the “Libraries” tab .
Ok run a simple Spark code to test:
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by BIGBAI on 18/11/2016.
*/
object TestScala {
def main(args: Array[String]): Unit={
val conf = new SparkConf().setAppName("first spark")
conf.setMaster("local[2]")
val sc = new SparkContext(conf)
sc.stop()
println(sc)
}
}
If everything is ok, the Spark2.X can successfully run in Intellij Idea. If there is still some errors, please feel free to discuss with me. Or, you can try to use SBT to set up Spark2.X.