
Thus, Spark driver pods need a Kubernetes service account in the pod's namespace that has permissions to create, get, list, and delete executor pods. Once connected, the SparkContext acquires executors on nodes in the cluster, which are the processes that run computations and store data for your application. Remember, Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program, called the driver. This series of 3 articles tells the story of my experiments with both methods, and how I launch Spark applications from Python code. Use the Spark Operator, proposed and maintained by Google, which is still in beta version (and always will be).

Kubernetes support was still flagged as experimental until very recently, but as per SPARK-33005 Kubernetes GA Preparation, Spark on Kubernetes is now fully supported and production ready! 🎊


Use "native" Spark's Kubernetes capabilities: Spark can run on clusters managed by Kubernetes since Spark 2.3. In the meantime, the Kingdom of Kubernetes has risen and spread widely.Īnd when it comes to run Spark on Kubernetes, you now have two choices: Until not long ago, the way to go to run Spark on a cluster was either with Spark's own standalone cluster manager, Mesos or YARN.
