Hi,
I am configuring a Spark Deployment to run GPU workloads using RAPIDS. Tried following the documentations available here.
Versions:
Kubernetes: v1.22.12-eks-ba74326 (Running on AWS EKS)
Spark: 3.1.1
RAPIDS Jar: rapids-4-spark_2.12-22.12.0.jar
Current Process:
- Created a Spark Cluster with bitnami helm charts on Kubernetes Version v1.22.12 on AWS EKS.
- Ran a non-GPU test workload to confirm if the spark cluster is operating as intended. (Running without issues)
- Created a Docker Image following the RAPIDS documentation for spark.
- Used spark-submit to submit the test workload available on the RAPIDS page. This step failed the the error below.
Error Dump for spark-submit:
23/01/10 08:37:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
23/01/10 08:37:04 INFO SecurityManager: Changing view acls to: spark
23/01/10 08:37:04 INFO SecurityManager: Changing modify acls to: spark
23/01/10 08:37:04 INFO SecurityManager: Changing view acls groups to:
23/01/10 08:37:04 INFO SecurityManager: Changing modify acls groups to:
23/01/10 08:37:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
23/01/10 08:37:04 INFO Utils: Successfully started service ‘driverClient’ on port 45019.
23/01/10 08:37:04 INFO TransportClientFactory: Successfully created connection to katonic-spark-operator-master-svc/172.20.18.180:7077 after 26 ms (0 ms spent in bootstraps)
23/01/10 08:37:04 INFO ClientEndpoint: … waiting before polling master for driver state
23/01/10 08:37:04 INFO ClientEndpoint: Driver successfully submitted as driver-20230110083704-0001
23/01/10 08:37:09 INFO ClientEndpoint: State of driver-20230110083704-0001 is ERROR
23/01/10 08:37:09 ERROR ClientEndpoint: Exception from cluster was: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme “local”
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme “local”
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1980)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:817)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:557)
at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:162)
at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:179)
at org.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:99)
23/01/10 08:37:09 INFO ShutdownHookManager: Shutdown hook called
23/01/10 08:37:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-9f37bad9-c981-4194-a448-2c6936e8f7bd