Failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory

Hello,

I am trying to use the C_API from tensorflow through the cppflow framework. I am able to load the model, but the inference fails both in GPU and CPU.

Configuration:
PC with one graphic card, accessed through X2GO
NVIDIA QUATRO M2000 4GB
Ubuntu 18.04
CUDA 11.2
CUDNN 8.1.0
Tensorflow 2.4

(semseg_env) lsm1so@ABTZ0IY7:~/challenge/cppflow/build_load$ sudo --preserve-env=CUDA_VISIBLE_DEVICES ./load_library
2021-02-05 18:23:57.000069: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
CUDA_VISIBLE_DEVICES: 0
NVIDIA_VISIBLE_DEVICES: 0
2021-02-05 18:23:57.260890: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-05 18:23:57.262505: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-05 18:23:57.263199: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-02-05 18:23:57.303978: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-02-05 18:23:57.304033: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: ABTZ0IY7
2021-02-05 18:23:57.304047: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: ABTZ0IY7
2021-02-05 18:23:57.304097: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.32.3
2021-02-05 18:23:57.304152: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.32.3
2021-02-05 18:23:57.304169: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.32.3
2021-02-05 18:23:57.337479: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-05 18:23:57.343405: I tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: /home/lsm1so/challenge/cppflow/
2021-02-05 18:23:57.405478: I tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags { serve }
2021-02-05 18:23:57.405545: I tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: /home/lsm1so/challenge/cppflow/
2021-02-05 18:23:57.405625: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-05 18:23:57.632101: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2021-02-05 18:23:57.632171: I tensorflow/cc/saved_model/loader.cc:216] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /home/lsm1so/challenge/cppflow/variables/variables.index
2021-02-05 18:23:57.632209: I tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /home/lsm1so/challenge/cppflow/
2021-02-05 18:23:57.674250: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492195000 Hz
2021-02-05 18:23:57.842170: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 498768 microseconds.
terminate called after throwing an instance of ‘std::runtime_error’
what(): Default MaxPoolingOp only supports NHWC on device type CPU
[[{{node release_schaefer_net/schaefer_net_core_layer/max_pool/MaxPool}}]]
Aborted

The code is quite simple:

int main()
{
std::string state = getEnv(“CUDA_VISIBLE_DEVICES”);
std::cout << "CUDA_VISIBLE_DEVICES: " << state << std::endl;
std::string state2 = getEnv(“NVIDIA_VISIBLE_DEVICES”);
std::cout << "NVIDIA_VISIBLE_DEVICES: " << state2 << std::endl;

// Default MaxPoolingOp only supports NHWC on device type CPU, hence required shape is used
// N: number of images in the batch
// H: height of the image
// W: width of the image
// C: number of channels of the image (ex: 3 for RGB, 1 for grayscale...)

auto input_1 = cppflow::fill({1, 64, 752, 1}, 1.0f);
auto input_2 = cppflow::fill({1, 64, 752, 1}, 1.0f);

// Changing the configuration, allowing memory growth.
TF_Status *status = TF_NewStatus();
TFE_ContextOptions *tfe_opts = TFE_NewContextOptions();
// memory_fraction_to_use = 0.8  enable_memory_growth = True
uint8_t config[13] = {0x32, 0xb, 0x9, 0x9a, 0x99, 0x99, 0x99, 0x99, 0x99, 0xe9, 0x3f, 0x20, 0x1};
TFE_ContextOptionsSetConfig(tfe_opts, (void *) config, sizeof(config) / sizeof(config[0]), status);
cppflow::get_global_context() = cppflow::context(tfe_opts);

// Will look for a saved_model.pb at this location
cppflow::model model("/home/lsm1so/challenge/cppflow/");
auto output = model(
    {{"serving_default_distance:0", input_1}, {"serving_default_intensity:0", input_2}},
    {"StatefulPartitionedCall:0", "StatefulPartitionedCall:1", "StatefulPartitionedCall:2", "StatefulPartitionedCall:3"}
);
return 0;

}

The CPU run might be failing because the ops are only supported by the GPU. But I dont get the GPU error. After searching in google, I found the following possible rootcauses:

  • Not running in SUDO (checked)
  • CUDA_VISIBLE_DEVICES or NVIDIA_VISIBLE_DEVICES not set properly (checked)
  • nvidia-smi not working (checked)
  • Not enough RAM on GPU, set enable_memory_growth (checked)
  • Restart PC (checked)
  • Other process running in background, using all the memory (I don t think so)
  • Check library compatibility (maybe done?)

Regarding this 2 last points, I checked the running processes. I have only these ones. I tried to kill Xorg/sddm, but they pop up again if both are killed. With nvidia-smi, we see that they consumn almost nothing, so they are not relevant.

(semseg_env) lsm1so@ABTZ0IY7:~/challenge/cppflow/build_load$ sudo fuser -v /dev/nvidia*
USER PID ACCESS COMMAND
/dev/nvidia0: root 1028 F… nvidia-persiste
root 10572 F…m Xorg
sddm 10600 F…m sddm-greeter
/dev/nvidiactl: root 1028 F… nvidia-persiste
root 10572 F…m Xorg
sddm 10600 F…m sddm-greeter
/dev/nvidia-modeset: root 1028 F… nvidia-persiste
root 10572 F… Xorg
sddm 10600 F… sddm-greeter

And for the last, I am using the TF libraries from here, which are the GPU ones:

Linux GPU support https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-2.4.0.tar.gz

I have started the analysis in github, but I think now it is more an infrastructure configuration issue. The model itself is tested and works.

Try to change the memory GPU you’re allocating when you u r compiling your model

tf.app.flags.DEFINE_float(
    'gpu_memory_fraction', 1.0, 'Gpu memory fraction to use')

 gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=FLAGS.gpu_memory_fraction)

    config = tf.ConfigProto(
        gpu_options=gpu_options,
        log_device_placement=False,
    )