Regarding NVIDIA/spark-rapids-examples-MortgageETL+XGBoost notebook

I was trying to run the MortgageETL+XGBoost.ipynb example but I am facing the following issue, which is specifically coming while I am fitting the xgboost model.

Code:

def with_benchmark(phrase, action):
    start = time.time()
    result = action()
    end = time.time()
    print("{} takes {} seconds".format(phrase, end - start))
    return result
model = with_benchmark("Training", lambda: classifier.fit(train_data_va))

Error Message:
22/10/31 08:46:15 ERROR NativeLibLoader: failed to load xgboost4j library from jar
22/10/31 08:46:15 ERROR DMatrix: Failed to load native library
java.io.FileNotFoundException: File /lib/cuda10/libxgboost4j.so was not found inside JAR.
at ml.dmlc.xgboost4j.java.NativeLibLoader.createTempFileFromResource(NativeLibLoader.java:189)
at ml.dmlc.xgboost4j.java.NativeLibLoader.loadLibraryFromJar(NativeLibLoader.java:132)
at ml.dmlc.xgboost4j.java.NativeLibLoader.initXGBoost(NativeLibLoader.java:66)
at ml.dmlc.xgboost4j.java.XGBoostJNI.(XGBoostJNI.java:36)
at ml.dmlc.xgboost4j.java.DMatrix.(DMatrix.java:79)
at ml.dmlc.xgboost4j.scala.DMatrix.(DMatrix.scala:42)
at ml.dmlc.xgboost4j.scala.spark.Watches$.buildWatches(XGBoost.scala:846)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.$anonfun$trainForNonRanking$1(XGBoost.scala:500)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)

Here are the configuration packages/files I am using to run the same:

  • xgboost4j-spark_3.0:jar:1.4.2-0.3.0
  • xgboost4j_3.0:jar:1.4.2-0.3.0
  • rapids-4-spark_2.12:jar:22.10.0
  • pyspark==3.3.1

Below is the system configuration I am using:

  • 4 GPUs: Tesla V100, 16GB each
  • NVIDIA Driver Version: 470.57.02
  • CUDA Version: 11.4
  • Machine: 8 Core 32GB
  • Ubuntu 18.04.6 LTS
    #inception program

It looks strange to me, are you having two CUDA version on one machine?

1 Like

No, I am using only one CUDA version, here is a screenshot for nvidia-smi command:

Thank you for pointing out the Cuda version issue, after clearly examining it, my DevOps team has fixed the above issue.

1 Like