Running into CudaErrorMemoryAllocation arbitorily

Hardware Platform: DriveWorks AGX Xavier
Software Version: Drive Software 2.2
Host Machine Version: Ubuntu 18.04.4 LTS (Bionic Beaver)
SDK Manager Version: 1.1.0.6343

Note: I am running everything I mentioned below in Linux 18.4 Host machine.

I am running into below error every once in a while running the IMUPlugin for custom IMU sensor.

./sample_imu_logger --driver=imu.custom --params=decoder-path=.../sample_build/src/sensors/plugins/OpenIMU300/libopenimu_plugin.so,can-proto=can.socket,device=slcan0,packetRate=1

[15-06-2020 15:11:13] Platform: Detected Generic x86 Platform

[15-06-2020 15:11:13] TimeSource: monotonic epoch time offset is 1592253860887882

[15-06-2020 15:11:14] Driveworks exception thrown: Platform: cannot acquire CUDA context, error cudaErrorMemoryAllocation: out of memory

[15-06-2020 15:11:14] Driveworks exception thrown: DW_INVALID_HANDLE: Cannot cast to C handle, given instance is a nullptr, type=P15dwContextObject

[15-06-2020 15:11:14] Driveworks exception thrown: DW_INVALID_HANDLE: Cannot cast to C handle, given instance is a nullptr, type=P11dwSALObject

Cannot create sensor imu.custom with decoder-path=.../sample_build/src/sensors/plugins/OpenIMU300/libopenimu_plugin.so,can-proto=can.socket,device=slcan0,packetRate=1

[15-06-2020 15:11:14] Driveworks exception thrown: DW_INVALID_HANDLE: Cannot cast to C handle, given instance is a nullptr, type=P11dwSALObject

[15-06-2020 15:11:14] Driveworks exception thrown: DW_INVALID_HANDLE: Cannot cast to C handle, given instance is a nullptr, type=P15dwContextObject

Am I doing something wrong memory allocation wise? I am not using dynamic memory allocation in my plugin anywhere. This problem goes away when I restart the computer. It comes back once a day after running the plugin to test my changes many times.

Let me know if you need more information.

Regards,
Rishit

Dear @rborad,
Which GPU you have? Could you share CUDA sample deviceQuery output log.
Could you check nvidia-smi in a loop to know GPU memory utilization to see if the GPU memory usage is increasing in each run?

1 Like

Thanks for the reply Siva. I have Quadro T1000 Graphics.
Tue Jun 16 11:15:36 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro T1000 Off | 00000000:01:00.0 Off | N/A |
| N/A 53C P3 13W / N/A | 2137MiB / 3914MiB | 2% Default |
±------------------------------±---------------------±---------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2232      G   /usr/lib/xorg/Xorg                           126MiB |
|    0      3012      G   /usr/bin/gnome-shell                          92MiB |
|    0      4759      C   ./sample_imu_logger                          119MiB |
|    0      4798      C   ./sample_imu_logger                          119MiB |
|    0      4831      C   ./sample_imu_logger                          119MiB |
|    0      4865      C   ./sample_imu_logger                          119MiB |
|    0      4897      C   ./sample_imu_logger                          119MiB |
|    0      4935      C   ./sample_imu_logger                          119MiB |
|    0      4967      C   ./sample_imu_logger                          119MiB |
|    0      4999      C   ./sample_imu_logger                          119MiB |
|    0      5031      C   ./sample_imu_logger                          119MiB |
|    0      5063      C   ./sample_imu_logger                          119MiB |
|    0      5095      C   ./sample_imu_logger                          119MiB |
|    0      5127      C   ./sample_imu_logger                          119MiB |
|    0      5159      C   ./sample_imu_logger                          119MiB |
|    0      5191      C   ./sample_imu_logger                          119MiB |
|    0      5223      C   ./sample_imu_logger                          119MiB |
|    0      5255      C   ./sample_imu_logger                          119MiB |
+-----------------------------------------------------------------------------+

Looks like each time I run the sample it runs always without terminating. How do I terminate them automatically? Is this happening because something I am doing with my custom plugin?

Thanks in advance.
Rishit

Dear @rborad,
It is clear from nvidia-smi that the memory allocations from each process are not cleared. How are you killing/stopping the process? Do you see the same when running sample in multiple launches without your custom plugin?

1 Like

This is happening because I am terminating the plugin using Ctrl+z in command line. Plugin doesn’t close properly. Sample plugins close the plugin instance properly because it exits properly once all the input data is processed (it uses a finite logger file to read data). In my case, I am using real hardware to read data. HW sends data continuously and plugin parses that infinite data continuously until you force it to break. In this case plugin doesn’t close properly because it never gets to the point where SAL calls _dwSensorPlugin_stop(). I see the same problem when I use the sample plugin with real hardware (sample_CAN_logger using [can.socket, param=device=slcan0]).

How do I fix this?

Thanks in advance.
Rishit

Hi @rborad,

Ctrl-Z just stops the process in the background. You can still see it with “jobs” command and foreground it with “fg” command. That’s why it still occupies the GPU memory. Please try to kill it with “kill” command.

1 Like