Trouble deploying Java Standalone Spark Job

Discussion:

Scott Langevin

2013-05-17 19:01:10 UTC

I've been trying to figure out how to write a standalone spark job and
deploy it on a spark cluster we having running on mesos. Our spark
installation is functional - we can connect to spark-shell and run jobs
interactively, but I'm trying to build a standalone job we can deploy.

I have followed the quick start instructions on how to use Maven for spark
dependencies in a Java eclipse project:
http://spark-project.org/docs/latest/quick-start.html

My eclipse project is pretty simple, it's just a Test class with a main(),
which create a JavaSparkContext and does a few simple map-reduce operations
on a text file. My JavaSparkContext looks like this:

JavaSparkContext sc = new JavaSparkContext("mesos://master:5050", "TEST",
"/opt/spark-0.7.0", "SparkTest-0.0.1-SNAPSHOT.jar");
Where I'm stuck is how to actually deploy this to the cluster. I'm using
maven to create the jar file (SparkTest-0.0.1-SNAPSHOT.jar), which I tried
copying to the spark master node. I tried to execute Test.main() using:

java -cp SparkTest-0.0.1-SNAPSHOT.jar sparktest.Test

But I get the following exception:

Exception in thread "main" java.lang.NoClassDefFoundError:
spark/api/java/function/FlatMapFunction
Caused by: java.lang.ClassNotFoundException:
spark.api.java.function.FlatMapFunction
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: com.oculus.spark.Test. Program will exit.

I also tried building a jar file with all the dependencies included but
that was giving an Akka configuration exception. I found others on this
mailing list that had a similar problem and they solved it by not bundling
all the dependencies with the jar.

So does anyone know how to actually deploy a Java spark job? What is the
best practice?

Thanks!

Scott

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.

Josh Rosen

2013-05-17 19:47:34 UTC

Permalink

It looks like Spark's files aren't present on the classpath.

(Disclaimer: I haven't actually tested any of the following, so I could be wrong)

Try running

SPARK_CLASSPATH=SparkTest-0.0.1-SNAPSHOT.jar $SPARK_HOME/run spartest.Test

This will use the Spark `run` script to add the required Spark classes to the classpath and load the settings from spark-env.sh.

I don't think that you should bundle the Spark library in your JAR. In general, Spark releases are API-compatible with each other but not binary-compatible: for example, you can't connect to a cluster running Spark 0.6.0 from a client application running against the Spark 0.6.1 JAR. What I'd do is to mark Spark as a "provided" dependency in whatever build system you're using, then use Spark's `run` script (or your own custom environment setup script) to add the cluster's Spark JARs to the classpath. This will ensure that your code runs against the version of Spark that's actually installed on your cluster.

I don't think that we have a list of best practices for deploying Spark jobs to clusters; that would be a useful addition to our documentation.

I've been trying to figure out how to write a standalone spark job and deploy it on a spark cluster we having running on mesos. Our spark installation is functional - we can connect to spark-shell and run jobs interactively, but I'm trying to build a standalone job we can deploy.
I have followed the quick start instructions on how to use Maven for spark dependencies in a Java eclipse project: http://spark-project.org/docs/latest/quick-start.html
JavaSparkContext sc = new JavaSparkContext("mesos://master:5050", "TEST", "/opt/spark-0.7.0", "SparkTest-0.0.1-SNAPSHOT.jar");
java -cp SparkTest-0.0.1-SNAPSHOT.jar sparktest.Test
Exception in thread "main" java.lang.NoClassDefFoundError: spark/api/java/function/FlatMapFunction
Caused by: java.lang.ClassNotFoundException: spark.api.java.function.FlatMapFunction
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: com.oculus.spark.Test. Program will exit.
I also tried building a jar file with all the dependencies included but that was giving an Akka configuration exception. I found others on this mailing list that had a similar problem and they solved it by not bundling all the dependencies with the jar.
So does anyone know how to actually deploy a Java spark job? What is the best practice?
Thanks!
Scott
--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
For more options, visit https://groups.google.com/groups/opt_out.

Scott Langevin

2013-05-17 23:11:41 UTC

Permalink

Thanks Josh!

That seems to have done the trick. I can now run my job against our
cluster.

Scott

Post by Josh Rosen
It looks like Spark's files aren't present on the classpath.
(Disclaimer: I haven't actually tested any of the following, so I could be wrong)
Try running
SPARK_CLASSPATH=SparkTest-0.0.1-SNAPSHOT.jar $SPARK_HOME/run spartest.Test
This will use the Spark `run` script to add the required Spark classes to
the classpath and load the settings from spark-env.sh.
I don't think that you should bundle the Spark library in your JAR. In
general, Spark releases are API-compatible with each other but not
binary-compatible: for example, you can't connect to a cluster running
Spark 0.6.0 from a client application running against the Spark 0.6.1 JAR.
What I'd do is to mark Spark as a "provided" dependency in whatever build
system you're using, then use Spark's `run` script (or your own custom
environment setup script) to add the cluster's Spark JARs to the classpath.
This will ensure that your code runs against the version of Spark that's
actually installed on your cluster.
I don't think that we have a list of best practices for deploying Spark
jobs to clusters; that would be a useful addition to our documentation.
I've been trying to figure out how to write a standalone spark job and
deploy it on a spark cluster we having running on mesos. Our spark
installation is functional - we can connect to spark-shell and run jobs
interactively, but I'm trying to build a standalone job we can deploy.
I have followed the quick start instructions on how to use Maven for spark
http://spark-project.org/docs/latest/quick-start.html
My eclipse project is pretty simple, it's just a Test class with a main(),
which create a JavaSparkContext and does a few simple map-reduce operations
JavaSparkContext sc = new JavaSparkContext("mesos://master:5050", "TEST",
"/opt/spark-0.7.0", "SparkTest-0.0.1-SNAPSHOT.jar");
Where I'm stuck is how to actually deploy this to the cluster. I'm using
maven to create the jar file (SparkTest-0.0.1-SNAPSHOT.jar), which I
tried copying to the spark master node. I tried to execute Test.main()
java -cp SparkTest-0.0.1-SNAPSHOT.jar sparktest.Test
spark/api/java/function/FlatMapFunction
spark.api.java.function.FlatMapFunction
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: com.oculus.spark.Test. Program will exit.
I also tried building a jar file with all the dependencies included but
that was giving an Akka configuration exception. I found others on this
mailing list that had a similar problem and they solved it by not bundling
all the dependencies with the jar.
So does anyone know how to actually deploy a Java spark job? What is the best practice?
Thanks!
Scott
--
You received this message because you are subscribed to the Google Groups
"Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.

Max

2013-12-27 21:33:42 UTC

Permalink

What is the difference between including my app (dependencies) jars in
SPARK_CLASSPATH and putting the jars in the "jars" parameter (Seq) in
SparkContext? In my case, SPARK_CLASSPATH works fine. Btut the latter way
turns out runtime exception. When I checked on slaves, the jars are shipped
and loaded on workers, but the runtime exception says something is not
found.

Thanks,

Max

Ravi Hemnani

2013-12-12 12:03:21 UTC

Permalink

@Scott Langevin: Can you tell me what is "sparktest.Test" ?

Because i am trying my hands on spark and running examples again my cluster
that i created but i am getting the same error. How did you solve the
issue? Where did you modify the SPARK_CLASSPATH?

Post by Scott Langevin
I've been trying to figure out how to write a standalone spark job and
deploy it on a spark cluster we having running on mesos. Our spark
installation is functional - we can connect to spark-shell and run jobs
interactively, but I'm trying to build a standalone job we can deploy.
I have followed the quick start instructions on how to use Maven for spark
http://spark-project.org/docs/latest/quick-start.html
My eclipse project is pretty simple, it's just a Test class with a main(),
which create a JavaSparkContext and does a few simple map-reduce operations
JavaSparkContext sc = new JavaSparkContext("mesos://master:5050", "TEST",
"/opt/spark-0.7.0", "SparkTest-0.0.1-SNAPSHOT.jar");
Where I'm stuck is how to actually deploy this to the cluster. I'm using
maven to create the jar file (SparkTest-0.0.1-SNAPSHOT.jar), which I
tried copying to the spark master node. I tried to execute Test.main()
java -cp SparkTest-0.0.1-SNAPSHOT.jar sparktest.Test
spark/api/java/function/FlatMapFunction
spark.api.java.function.FlatMapFunction
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: com.oculus.spark.Test. Program will exit.
I also tried building a jar file with all the dependencies included but
that was giving an Akka configuration exception. I found others on this
mailing list that had a similar problem and they solved it by not bundling
all the dependencies with the jar.
So does anyone know how to actually deploy a Java spark job? What is the
best practice?
Thanks!
Scott