Failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory

sebastien.lemetter · February 5, 2021, 5:53pm

Hello,

I am trying to use the C_API from tensorflow through the cppflow framework. I am able to load the model, but the inference fails both in GPU and CPU.

Configuration:
PC with one graphic card, accessed through X2GO
NVIDIA QUATRO M2000 4GB
Ubuntu 18.04
CUDA 11.2
CUDNN 8.1.0
Tensorflow 2.4

(semseg_env) lsm1so@ABTZ0IY7:~/challenge/cppflow/build_load$ sudo --preserve-env=CUDA_VISIBLE_DEVICES ./load_library
2021-02-05 18:23:57.000069: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
CUDA_VISIBLE_DEVICES: 0
NVIDIA_VISIBLE_DEVICES: 0
2021-02-05 18:23:57.260890: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-05 18:23:57.262505: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-05 18:23:57.263199: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-02-05 18:23:57.303978: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-02-05 18:23:57.304033: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: ABTZ0IY7
2021-02-05 18:23:57.304047: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: ABTZ0IY7
2021-02-05 18:23:57.304097: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.32.3
2021-02-05 18:23:57.304152: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.32.3
2021-02-05 18:23:57.304169: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.32.3
2021-02-05 18:23:57.337479: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-05 18:23:57.343405: I tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: /home/lsm1so/challenge/cppflow/
2021-02-05 18:23:57.405478: I tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags { serve }
2021-02-05 18:23:57.405545: I tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: /home/lsm1so/challenge/cppflow/
2021-02-05 18:23:57.405625: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-05 18:23:57.632101: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2021-02-05 18:23:57.632171: I tensorflow/cc/saved_model/loader.cc:216] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /home/lsm1so/challenge/cppflow/variables/variables.index
2021-02-05 18:23:57.632209: I tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /home/lsm1so/challenge/cppflow/
2021-02-05 18:23:57.674250: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492195000 Hz
2021-02-05 18:23:57.842170: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 498768 microseconds.
terminate called after throwing an instance of ‘std::runtime_error’
what(): Default MaxPoolingOp only supports NHWC on device type CPU
[[{{node release_schaefer_net/schaefer_net_core_layer/max_pool/MaxPool}}]]
Aborted

The code is quite simple:

int main()
{
std::string state = getEnv(“CUDA_VISIBLE_DEVICES”);
std::cout << "CUDA_VISIBLE_DEVICES: " << state << std::endl;
std::string state2 = getEnv(“NVIDIA_VISIBLE_DEVICES”);
std::cout << "NVIDIA_VISIBLE_DEVICES: " << state2 << std::endl;

// Default MaxPoolingOp only supports NHWC on device type CPU, hence required shape is used
// N: number of images in the batch
// H: height of the image
// W: width of the image
// C: number of channels of the image (ex: 3 for RGB, 1 for grayscale...)

auto input_1 = cppflow::fill({1, 64, 752, 1}, 1.0f);
auto input_2 = cppflow::fill({1, 64, 752, 1}, 1.0f);

// Changing the configuration, allowing memory growth.
TF_Status *status = TF_NewStatus();
TFE_ContextOptions *tfe_opts = TFE_NewContextOptions();
// memory_fraction_to_use = 0.8  enable_memory_growth = True
uint8_t config[13] = {0x32, 0xb, 0x9, 0x9a, 0x99, 0x99, 0x99, 0x99, 0x99, 0xe9, 0x3f, 0x20, 0x1};
TFE_ContextOptionsSetConfig(tfe_opts, (void *) config, sizeof(config) / sizeof(config[0]), status);
cppflow::get_global_context() = cppflow::context(tfe_opts);

// Will look for a saved_model.pb at this location
cppflow::model model("/home/lsm1so/challenge/cppflow/");
auto output = model(
    {{"serving_default_distance:0", input_1}, {"serving_default_intensity:0", input_2}},
    {"StatefulPartitionedCall:0", "StatefulPartitionedCall:1", "StatefulPartitionedCall:2", "StatefulPartitionedCall:3"}
);
return 0;

}

The CPU run might be failing because the ops are only supported by the GPU. But I dont get the GPU error. After searching in google, I found the following possible rootcauses:

Not running in SUDO (checked)
CUDA_VISIBLE_DEVICES or NVIDIA_VISIBLE_DEVICES not set properly (checked)
nvidia-smi not working (checked)
Not enough RAM on GPU, set enable_memory_growth (checked)
Restart PC (checked)
Other process running in background, using all the memory (I don t think so)
Check library compatibility (maybe done?)

Regarding this 2 last points, I checked the running processes. I have only these ones. I tried to kill Xorg/sddm, but they pop up again if both are killed. With nvidia-smi, we see that they consumn almost nothing, so they are not relevant.

(semseg_env) lsm1so@ABTZ0IY7:~/challenge/cppflow/build_load$ sudo fuser -v /dev/nvidia*
USER PID ACCESS COMMAND
/dev/nvidia0: root 1028 F… nvidia-persiste
root 10572 F…m Xorg
sddm 10600 F…m sddm-greeter
/dev/nvidiactl: root 1028 F… nvidia-persiste
root 10572 F…m Xorg
sddm 10600 F…m sddm-greeter
/dev/nvidia-modeset: root 1028 F… nvidia-persiste
root 10572 F… Xorg
sddm 10600 F… sddm-greeter

And for the last, I am using the TF libraries from here, which are the GPU ones:

Linux GPU support https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-2.4.0.tar.gz

I have started the analysis in github, but I think now it is more an infrastructure configuration issue. The model itself is tested and works.

github.com/serizba/cppflow

Change session options in the model

opened 05:05PM - 04 Feb 21 UTC

closed 03:37PM - 03 Mar 22 UTC

sebaleme

Hi, I am getting the following errors when running the model: ``` 2021-02-0…4 17:56:45.145404: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory (...) terminate called after throwing an instance of 'std::runtime_error' what(): Default MaxPoolingOp only supports NHWC on device type CPU [[{{node release_schaefer_net/schaefer_net_core_layer/max_pool/MaxPool}}]] Aborted (core dumped) ``` The first one says that my graphic card has not enough Ram (4Gb). However, TF then tries to run the model on CPU, but there, it says the input shape or dimentions is not supported by the operation. I tried several stuff, but couldn t run the model. Since I could not get rid of this error, I decided to try to solve the GPU error. I found a solution for the cpp API, but I need to apply it in the cppflow framework. I couldn t find an easy way, it seems it has to be done in the model.h, where the session is created. If anyone has an idea, let me know. A general idea would be to provide an interface to the user to add such configurations. https://stackoverflow.com/questions/43503409/tensorflow-cuda-error-out-of-memory ``` int main() { // Default MaxPoolingOp only supports NHWC on device type CPU, hence required shape is used // N: number of images in the batch // H: height of the image // W: width of the image // C: number of channels of the image (ex: 3 for RGB, 1 for grayscale...) auto input_1 = cppflow::fill({1,64, 7521}, 1.0f); auto input_2 = cppflow::fill({1,64, 7521}, 1.0f); cppflow::model model("/home/lsm1so/challenge/cppflow/"); auto output = model( {{"serving_default_distance:0", input_1}, {"serving_default_intensity:0", input_2}}, {"StatefulPartitionedCall:0", "StatefulPartitionedCall:1", "StatefulPartitionedCall:2", "StatefulPartitionedCall:3"} ); std::cout << output << std::endl; return 0; } ```

chakibdace · April 22, 2021, 11:05am

Try to change the memory GPU you’re allocating when you u r compiling your model

tf.app.flags.DEFINE_float(
    'gpu_memory_fraction', 1.0, 'Gpu memory fraction to use')

 gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=FLAGS.gpu_memory_fraction)

    config = tf.ConfigProto(
        gpu_options=gpu_options,
        log_device_placement=False,
    )

Topic		Replies	Views
CUDA_ERROR_OUT_OF_MEMORY: out of memory cuDNN cuda , tensorflow , windows-driver	1	2096	July 31, 2023
CUDA_ERROR_OUT_OF_MEMORY HELP!!! CUDA Programming and Performance	2	2966	February 13, 2018
Multiple executive warnings after switching tensorflow from 2.16.1 CPU to v60dp tensorflow==2.15.0+nv24.03 GPU version Jetson Orin Nano cudnn	8	2322	May 21, 2024
SSD: functioned well on CPU but failed on GPU Jetson TX2	7	979	October 18, 2021
Fail to initialize CUDNN when running tensorflow: CUDNN_STATUS_INTERNAL_ERROR Jetson AGX Xavier tensorflow , cudnn	7	3000	October 18, 2021
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Frameworks (archived) tensorflow	1	1463	May 18, 2020
Difference of memory usage at each GPU model during tensorflow c++ inference Frameworks (archived) tensorflow	3	1914	November 20, 2019
CUBLAS_STATUS_NOT_iNITIALIZED GPU-Accelerated Libraries cuda , tensorflow , cudnn , cublas	3	14267	October 12, 2021
Windows 10: R-Studio+R 3.5.1+Tensorflow+Python 3.6- Convolution Neural Network Error while fitting the model CUDA Setup and Installation	2	788	August 12, 2018
CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory CUDA Programming and Performance cuda , gpu-computing	1	1074	December 13, 2023

Failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory

Related topics