Problems about running tinycudann on Jetson AGX Orin

Problems about running tinycudann on Jetson AGX Orin

Platform info:

Model: NVIDIA Orin Jetson-Small Developer Kit

CUDA Arch BIN: 8.7

System: Ubuntu 20.04 focal

Jetpack: 5.0.1 DP

cuDNN: 8.3.2.49

CUDA: release 11.4, V11.4.239

Problem1

I successfully compile the tinycudann (GitHub - NVlabs/tiny-cuda-nn: Lightning fast C++/CUDA neural network framework)

But when running the demo of tinycudann

[./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json],

it raises the out of memory error

So I reduce the batch size in the mlp_learning_an_image.cu

(tiny-cuda-nn/mlp_learning_an_image.cu at master · NVlabs/tiny-cuda-nn · GitHub )

But it still raises the same error. It is weird since Jetson AGX Orin has 32G GPU memory.

Problem2

Also, run the demo of tinycudann

[./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json].

I find it is extremely slow when loading the image (tiny-cuda-nn/mlp_learning_an_image.cu at master · NVlabs/tiny-cuda-nn · GitHub), and the slowest part is the “cudaMalloc” function (tiny-cuda-nn/gpu_memory.h at master · NVlabs/tiny-cuda-nn · GitHub).

However, I try the “cudaMalloc” function in a new CUDA project, and it works well. Only in the tinycudann project, “cudaMalloc” works extremely slowly.

Hi,

Since JetPack 5.0.1 is a DP version release, would you mind upgrading to JetPack 5.1 first?
Thanks.

Hi, I have tried to update the jetpack version, but the problem still exists.
Now, the version is:
Release: 5.10.104-teegra
CUDA: 11.4.315
TensorRT: 8.5.2.2

Hi,

Thanks for the testing.

Problem1

When you meet the OOM error, could you confirm it with the output of tegrastats.

$ sudo tegrastats

Problem2

Could you try to enable the device performance to see if it helps?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks

Hi, I looked at the tegrastats and the used memory is not over the limit. Also, I try the nvpmodel -m 0 and jetson_clocks, but it still cannot work.

Hi,

Thanks for the testing.

We are checking this issue internally.
Will give you an update later.

Hi,

It looks like the library doesn’t contain Orin GPU architecture (87).
Please change it to 87 and build it again.

	elif cuda_version < parse_version("11.8"):
		return 87

Thanks.

Thanks for your reply. I update CUDA to CUDA11.8 and change some code in tinycudann, now I can run the demo of tinycudann successfully. But I find the cpp version demo is extremely slower than the python version demo.

Hi,

Please share the GPU utilization ratio when running the demo script.

$ sudo tegrastats

Thanks.

This is the results when running c++ demo

Hi,

Based on the picture, the GPU utilization is already 99%.
This indicates that GPU is fully occupied.

Not sure why the C++ demo is slower than the python sample.
Are they the same use case? Or a different scenario?

Thanks.

Thanks for your help. Maybe I need to check it in detail.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.