Jetson TX2 cudaMalloc() failed with error all CUDA-capable devices are busy or unavailable

Hello I’m trying to install tinycuda in Jetson TX2 and having errors.
I think there are some memory errors but can’t find why and how to fix it.

(python3.6) nvidia@jetson-desktop:~/project/tiny-cuda-nn$ ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json
Loading custom json config ‘data/config_hash.json’.
tiny-cuda-nn warning: FullyFusedMLP is not supported for the selected architecture 62. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
tiny-cuda-nn warning: FullyFusedMLP is not supported for the selected architecture 62. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Beginning optimization with 10000000 training steps.
tiny-cuda-nn warning: GPUMemoryArena: GPU 0 does not support virtual memory. Falling back to regular allocations, which will be larger and can cause occasional stutter.
terminate called after throwing an instance of ‘std::runtime_error’
** what(): /home/nvidia/project/tiny-cuda-nn/include/tiny-cuda-nn/cuda_graph.h:99 cudaStreamEndCapture(stream, &m_graph) failed: operation failed due to a previous error during capture**
Aborted

[with cuda-memcheck]
(python3.6) nvidia@jetson-desktop:~/project/tiny-cuda-nn$ cuda-memcheck ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuMemGetAllocationGranularity + 0x16c) [0x1ee1cc]
========= Host Frame:./build/mlp_learning_an_image [0x735bc]
========= Host Frame:./build/mlp_learning_an_image [0x409d0]
========= Host Frame:./build/mlp_learning_an_image [0x40da4]
========= Host Frame:./build/mlp_learning_an_image [0x28cc4]
========= Host Frame:/lib/aarch64-linux-gnu/libc.so.6 (__libc_start_main + 0xe0) [0x207a0]
========= Host Frame:./build/mlp_learning_an_image [0x35830]

========= Program hit cudaErrorStreamCaptureUnsupported (error 900) due to “operation not permitted when stream is capturing” on CUDA API call to cudaDeviceSynchronize.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x32081c]
========= Host Frame:./build/mlp_learning_an_image [0x2560ec]
========= Host Frame:./build/mlp_learning_an_image [0x3fe50]
========= Host Frame:./build/mlp_learning_an_image [0x40758]
========= Host Frame:./build/mlp_learning_an_image [0x40da4]
========= Host Frame:./build/mlp_learning_an_image [0x28cc4]
========= Host Frame:/lib/aarch64-linux-gnu/libc.so.6 (__libc_start_main + 0xe0) [0x207a0]
========= Host Frame:./build/mlp_learning_an_image [0x35830]

========= Program hit cudaErrorStreamCaptureInvalidated (error 901) due to “operation failed due to a previous error during capture” on CUDA API call to cudaStreamEndCapture.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x32081c]
========= Host Frame:./build/mlp_learning_an_image [0x26e374]
========= Host Frame:./build/mlp_learning_an_image [0x3d168]
========= Host Frame:./build/mlp_learning_an_image [0x2ab08]
========= Host Frame:/lib/aarch64-linux-gnu/libc.so.6 (__libc_start_main + 0xe0) [0x207a0]
========= Host Frame:./build/mlp_learning_an_image [0x35830]
terminate called after throwing an instance of ‘=========
std::runtime_error’
what(): /home/nvidia/project/tiny-cuda-nn/include/tiny-cuda-nn/cuda_graph.h:99 cudaStreamEndCapture(stream, &m_graph) failed: operation failed due to a previous error during capture
========= Error: process didn’t terminate successfully
========= No CUDA-MEMCHECK results found

  • NVIDIA GPU or System : JETSON TX2
  • NVIDIA Software Version : Jetpack 4.6.4, python 3.6.9, cuda10.2
    *OS Ubuntu 18.04.6
  • Other Details
    I’m trying to run tinycuda and had error in cudaStreamEndCapture etc… with runtime error or just killed.
    Could you give me some idea why cudaMalloc error should be triggered?
    Thank you in advance

Hi,

Have you checked if tinycuda supports Jetson devices?

Killed is usually caused by running out of memory.
You can check this with tegrastats.

$ sudo tegrastats

Thanks.

Thank you for your reply

Tegrastats came up like below

Does the result mean this program(‘tinycuda’) isn’t able to allocate memorys?

========= Program hit CUDA_ERROR_NOT_SUPPORTED (error 801) due to “operation not supported” on CUDA API call to cuMemGetAllocationGranularity.

Should the error have something to do with this result?

Hi @AastaLLL
I am getting similar error on the ORIN AGX

output:
tiny-cuda-nn warning: GPUMemoryArena: GPU 0 does not support virtual memory. Falling back to regular allocations, which will be larger and can cause occasional stutter.
root@a520461b9c4d:/main/tiny-cuda-nn# ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json
Loading custom json config ‘data/config_hash.json’.
Beginning optimization with 10000000 training steps.
tiny-cuda-nn warning: GPUMemoryArena: GPU 0 does not support virtual memory. Falling back to regular allocations, which will be larger and can cause occasional stutter.
terminate called after throwing an instance of ‘std::runtime_error’
what(): /main/tiny-cuda-nn/include/tiny-cuda-nn/cuda_graph.h:99 cudaStreamEndCapture(stream, &m_graph) failed: operation failed due to a previous error during capture
Aborted (core dumped)

Any solution?

It looks exactly same as me. I haven’t found any solution yet.

What is you JetPack version? I’m my case, I’m starting to think I’ll have to upgrade to JetPack 5.1.2

Here is my system.

  • NVIDIA GPU or System : JETSON TX2
  • NVIDIA Software Version : Jetpack 4.6.4, python 3.6.9, cuda10.2
    *OS Ubuntu 18.04.6

Could you please let me know if it works in Jetpack 5.1.2?

doesn’t look like out-of-memory on tegrastats. So far no luck installing tinycudann on Jetpack 5.1-b147 ( Orin AGX, Ubuntu 20.04, aarch64), so I am too wondering if it supported on Jetson.

1 Like

Hi,

Sorry for the late update.

We just confirmed that Virtual Memory Management is not supported on the TX2 device.
Based on the document below:

We add the query code to the /usr/local/cuda-10.2/samples/1_Utilities/deviceQueryDrv sample:
And got the answer=NO which indicates the feature is not supported on the TX2.

int deviceSupportsVmm;
CUresult result = cuDeviceGetAttribute(&deviceSupportsVmm, CU_DEVICE_ATTRIBUTE_VIRTUAL_ADDRESS_MANAGEMENT_SUPPORTED, dev);
printf("  Supports Virtual Memory Management:            %s\n", deviceSupportsVmm ? "Yes" : "No")

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.