ARM64 does not support NUMA - returning NUMA node zero

I realize there are similar posts, but none seem to quite fit? The most similar post:

indicates a “known issue” and suggests reverting to v2.5.0+nv21.8 from TensorFlow v2.6.0+nv21.9. This post was from October of 2021 so is hard to believe it was not fixed before our release?
TensorFlow Version NVIDIA TensorFlow Container JetPack Version
2.7.0 22.01 4.6.1
Please advise…

Hi,

This should be a harmless warning only.
Do you get any errors or incorrect results from this?

Thanks.

From what I gathered online this means it is not using GPU? The code runs and completes, and the results are correct, but is quite a bit slower than expected. Running CPU only on my desktop is 25% faster. Perhaps there is some other configuration issue if this error message does not indicate it is ignoring the GPU?
So I missed this output later on… this seems to indicate it may be using GPU, but surprising slow:
2022-07-06 10:33:32.139760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 23535 MB memory: → device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2

Hi,

No. NUMA is not related to GPU.

For performance, could you try to maximize the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

yes, I did both of those: the first after install, and the second yesterday after my post… and although it improved the performance it is still pretty “bad” comparatively… I can run a CPU intensive normal C++ app on both and in that case I get better results on the Xavier (CPU only). But running model predictions with TF/Keras in Python on the Xavier using the GPU the results are disappointing. The prediction inputs are fairly large, there is quite a bit of data and many models in a single call, so perhaps the GPU memory is constraining performance? I ran trials reducing both and the performance gets much better, but in no case did it reach the performance of my desktop (Windows no less) without a GPU. When I reduce the load it is a bit more than a factor of 2 slower. For larger loads, the two system even out, perhaps both are reaching limits? My GPU experience is limited, so I am looking into what else I might do to improve the performance. Are there any other parameters I might tweak related to the GPU? I am also considering how I might break up the data sets and merge the results… but that is work :)

Hi,

We recommend users try our TensorRT inference engine for performance.

Not sure what kind of model you want to use.
Below are some benchmark data of TensorRT for your reference:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.