Problems about running tinycudann on Jetson AGX Orin
Platform info:
Model: NVIDIA Orin Jetson-Small Developer Kit
CUDA Arch BIN: 8.7
System: Ubuntu 20.04 focal
Jetpack: 5.0.1 DP
cuDNN: 8.3.2.49
CUDA: release 11.4, V11.4.239
Problem1
I successfully compile the tinycudann (GitHub - NVlabs/tiny-cuda-nn: Lightning fast C++/CUDA neural network framework )
But when running the demo of tinycudann
[./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json],
it raises the out of memory error
So I reduce the batch size in the mlp_learning_an_image.cu
(tiny-cuda-nn/mlp_learning_an_image.cu at master · NVlabs/tiny-cuda-nn · GitHub )
But it still raises the same error. It is weird since Jetson AGX Orin has 32G GPU memory.
Problem2
Also, run the demo of tinycudann
[./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json].
I find it is extremely slow when loading the image (tiny-cuda-nn/mlp_learning_an_image.cu at master · NVlabs/tiny-cuda-nn · GitHub ), and the slowest part is the “cudaMalloc” function (tiny-cuda-nn/gpu_memory.h at master · NVlabs/tiny-cuda-nn · GitHub ).
However, I try the “cudaMalloc” function in a new CUDA project, and it works well. Only in the tinycudann project, “cudaMalloc” works extremely slowly.
Hi,
Since JetPack 5.0.1 is a DP version release, would you mind upgrading to JetPack 5.1 first?
Thanks.
Hi, I have tried to update the jetpack version, but the problem still exists.
Now, the version is:
Release: 5.10.104-teegra
CUDA: 11.4.315
TensorRT: 8.5.2.2
Hi,
Thanks for the testing.
Problem1
When you meet the OOM error, could you confirm it with the output of tegrastats .
$ sudo tegrastats
Problem2
Could you try to enable the device performance to see if it helps?
$ sudo nvpmodel -m 0
$ sudo jetson_clocks
Thanks
Hi, I looked at the tegrastats and the used memory is not over the limit. Also, I try the nvpmodel -m 0 and jetson_clocks, but it still cannot work.
Hi,
Thanks for the testing.
We are checking this issue internally.
Will give you an update later.
Hi,
It looks like the library doesn’t contain Orin GPU architecture (87).
Please change it to 87 and build it again.
elif cuda_version < parse_version("11.8"):
return 87
return 50
else:
return 20
def max_supported_compute_capability(cuda_version):
if cuda_version < parse_version("11.0"):
return 75
elif cuda_version < parse_version("11.1"):
return 80
elif cuda_version < parse_version("11.8"):
return 86
else:
return 90
# Find version of tinycudann by scraping CMakeLists.txt
with open(os.path.join(ROOT_DIR, "CMakeLists.txt"), "r") as cmakelists:
for line in cmakelists.readlines():
if line.strip().startswith("VERSION"):
VERSION = line.split("VERSION")[-1].strip()
break
Thanks.
Thanks for your reply. I update CUDA to CUDA11.8 and change some code in tinycudann, now I can run the demo of tinycudann successfully. But I find the cpp version demo is extremely slower than the python version demo.
Hi,
Please share the GPU utilization ratio when running the demo script.
$ sudo tegrastats
Thanks.
This is the results when running c++ demo
Hi,
Based on the picture, the GPU utilization is already 99%.
This indicates that GPU is fully occupied.
Not sure why the C++ demo is slower than the python sample.
Are they the same use case? Or a different scenario?
Thanks.
Thanks for your help. Maybe I need to check it in detail.
system
Closed
April 25, 2023, 7:21am
15
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.