Jetson TX2 cudaMalloc() failed with error all CUDA-capable devices are busy or unavailable

hnj5247 · August 31, 2023, 1:40pm

Hello I’m trying to install tinycuda in Jetson TX2 and having errors.
I think there are some memory errors but can’t find why and how to fix it.

(python3.6) nvidia@jetson-desktop:~/project/tiny-cuda-nn$ ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json
Loading custom json config ‘data/config_hash.json’.
tiny-cuda-nn warning: FullyFusedMLP is not supported for the selected architecture 62. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
tiny-cuda-nn warning: FullyFusedMLP is not supported for the selected architecture 62. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Beginning optimization with 10000000 training steps.
tiny-cuda-nn warning: GPUMemoryArena: GPU 0 does not support virtual memory. Falling back to regular allocations, which will be larger and can cause occasional stutter.
terminate called after throwing an instance of ‘std::runtime_error’
** what(): /home/nvidia/project/tiny-cuda-nn/include/tiny-cuda-nn/cuda_graph.h:99 cudaStreamEndCapture(stream, &m_graph) failed: operation failed due to a previous error during capture**
Aborted

[with cuda-memcheck]
(python3.6) nvidia@jetson-desktop:~/project/tiny-cuda-nn$ cuda-memcheck ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuMemGetAllocationGranularity + 0x16c) [0x1ee1cc]
========= Host Frame:./build/mlp_learning_an_image [0x735bc]
========= Host Frame:./build/mlp_learning_an_image [0x409d0]
========= Host Frame:./build/mlp_learning_an_image [0x40da4]
========= Host Frame:./build/mlp_learning_an_image [0x28cc4]
========= Host Frame:/lib/aarch64-linux-gnu/libc.so.6 (__libc_start_main + 0xe0) [0x207a0]
========= Host Frame:./build/mlp_learning_an_image [0x35830]

========= Program hit cudaErrorStreamCaptureUnsupported (error 900) due to “operation not permitted when stream is capturing” on CUDA API call to cudaDeviceSynchronize.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x32081c]
========= Host Frame:./build/mlp_learning_an_image [0x2560ec]
========= Host Frame:./build/mlp_learning_an_image [0x3fe50]
========= Host Frame:./build/mlp_learning_an_image [0x40758]
========= Host Frame:./build/mlp_learning_an_image [0x40da4]
========= Host Frame:./build/mlp_learning_an_image [0x28cc4]
========= Host Frame:/lib/aarch64-linux-gnu/libc.so.6 (__libc_start_main + 0xe0) [0x207a0]
========= Host Frame:./build/mlp_learning_an_image [0x35830]

========= Program hit cudaErrorStreamCaptureInvalidated (error 901) due to “operation failed due to a previous error during capture” on CUDA API call to cudaStreamEndCapture.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x32081c]
========= Host Frame:./build/mlp_learning_an_image [0x26e374]
========= Host Frame:./build/mlp_learning_an_image [0x3d168]
========= Host Frame:./build/mlp_learning_an_image [0x2ab08]
========= Host Frame:/lib/aarch64-linux-gnu/libc.so.6 (__libc_start_main + 0xe0) [0x207a0]
========= Host Frame:./build/mlp_learning_an_image [0x35830]
terminate called after throwing an instance of ‘=========
std::runtime_error’
what(): /home/nvidia/project/tiny-cuda-nn/include/tiny-cuda-nn/cuda_graph.h:99 cudaStreamEndCapture(stream, &m_graph) failed: operation failed due to a previous error during capture
========= Error: process didn’t terminate successfully
========= No CUDA-MEMCHECK results found

NVIDIA GPU or System : JETSON TX2
NVIDIA Software Version : Jetpack 4.6.4, python 3.6.9, cuda10.2
*OS Ubuntu 18.04.6
Other Details
I’m trying to run tinycuda and had error in cudaStreamEndCapture etc… with runtime error or just killed.
Could you give me some idea why cudaMalloc error should be triggered?
Thank you in advance

AastaLLL · September 1, 2023, 8:45am

Hi,

Have you checked if tinycuda supports Jetson devices?

Killed is usually caused by running out of memory.
You can check this with tegrastats.

$ sudo tegrastats

Thanks.

hnj5247 · September 1, 2023, 10:24am

Thank you for your reply

Tegrastats came up like below

Does the result mean this program(‘tinycuda’) isn’t able to allocate memorys?

========= Program hit CUDA_ERROR_NOT_SUPPORTED (error 801) due to “operation not supported” on CUDA API call to cuMemGetAllocationGranularity.

Should the error have something to do with this result?

hg1 · September 2, 2023, 2:13am

Hi @AastaLLL
I am getting similar error on the ORIN AGX

output:
tiny-cuda-nn warning: GPUMemoryArena: GPU 0 does not support virtual memory. Falling back to regular allocations, which will be larger and can cause occasional stutter.
root@a520461b9c4d:/main/tiny-cuda-nn# ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json
Loading custom json config ‘data/config_hash.json’.
Beginning optimization with 10000000 training steps.
tiny-cuda-nn warning: GPUMemoryArena: GPU 0 does not support virtual memory. Falling back to regular allocations, which will be larger and can cause occasional stutter.
terminate called after throwing an instance of ‘std::runtime_error’
what(): /main/tiny-cuda-nn/include/tiny-cuda-nn/cuda_graph.h:99 cudaStreamEndCapture(stream, &m_graph) failed: operation failed due to a previous error during capture
Aborted (core dumped)

Any solution?

hnj5247 · September 4, 2023, 12:25am

It looks exactly same as me. I haven’t found any solution yet.

hg1 · September 4, 2023, 12:45am

What is you JetPack version? I’m my case, I’m starting to think I’ll have to upgrade to JetPack 5.1.2

hnj5247 · September 4, 2023, 1:06am

Here is my system.

NVIDIA GPU or System : JETSON TX2
NVIDIA Software Version : Jetpack 4.6.4, python 3.6.9, cuda10.2
*OS Ubuntu 18.04.6

Could you please let me know if it works in Jetpack 5.1.2?

hg1 · September 4, 2023, 2:52am

doesn’t look like out-of-memory on tegrastats. So far no luck installing tinycudann on Jetpack 5.1-b147 ( Orin AGX, Ubuntu 20.04, aarch64), so I am too wondering if it supported on Jetson.

AastaLLL · September 13, 2023, 7:08am

Hi,

Sorry for the late update.

We just confirmed that Virtual Memory Management is not supported on the TX2 device.
Based on the document below:

We add the query code to the /usr/local/cuda-10.2/samples/1_Utilities/deviceQueryDrv sample:
And got the answer=NO which indicates the feature is not supported on the TX2.

int deviceSupportsVmm;
CUresult result = cuDeviceGetAttribute(&deviceSupportsVmm, CU_DEVICE_ATTRIBUTE_VIRTUAL_ADDRESS_MANAGEMENT_SUPPORTED, dev);
printf("  Supports Virtual Memory Management:            %s\n", deviceSupportsVmm ? "Yes" : "No")

Thanks.

system · October 9, 2023, 5:44am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPU out of memory when the total ram usage is 2.8G Jetson TX2	28	18539	October 18, 2021
cufftPlan1D initialisation hides subsequence memory access errors CUDA Programming and Performance	8	653	November 24, 2020
General Question about Jetsons GPU/CPU Shared Memory Usage Jetson TX2	35	7467	October 18, 2021
Questions about efficient memory management for TensorRT on TX2 CUDA Programming and Performance	8	2008	October 12, 2021
All CUDA-capable devices busy or unavailable Jetson TX2 cuda	9	3994	December 28, 2021
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed Jetson TX2	8	6280	October 18, 2021
Tensorflow 1.6 not working with Jetpack 3.2 Jetson TX2	25	7073	October 18, 2021
Tensorflow fails to create a session and issue with docker Jetson TX2	10	2773	July 6, 2018
Python code using tensorflow and cuda on Jetson TX2 is getting killed (logs below) Jetson TX2	8	1088	October 18, 2021
VisionWorks+CUDA Segmentation Fault Jetson TX2	23	2220	October 25, 2017

Jetson TX2 cudaMalloc() failed with error all CUDA-capable devices are busy or unavailable

Related topics