"Couldn't invoke /usr/local/cuda/bin/ptxas --version" on jetson Xavier NX

Hi, errors : (1) " [local time…]: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory"(2) “Couldn’t invoke /usr/local/cuda/bin/ptxas --version” occurred when I used tensorflow v1.15.5 on jetson Xavier NX (cuda10.2, cudnn 8.0, jetpack 4.5).

Any help regarding these errors or warnings would be appreciated.


Hi @LiX
Based on your question, it looks like you might have better luck with a different forum branch (originally the question has been posted in CUDA-GDB forum branch, which is dedicated to CUDA-GDB tool support):

I have moved your topic to Jetson Xavier NX - NVIDIA Developer Forums

Hi @AKravets
Thanks for your replay!


Which TensorFlow package do you install?
For JetPack 4.5, please install it with the following command:

$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v45 'tensorflow<2'


Hi, @AastaLLL

Thanks for your comment. TensorFlow was installed successfully on my Xavier NX, which can be proven by the fact that $ import tensorflow as tf \n tf.test.is_gpu_available() and it return True. After creating an additional 8G swapfile on my machine above mentioned errors were fixed, but the other error occurred: Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.33GiB (rounded to xx). Additionally, “If the cause is memory fragmentation maybe the environment variable ‘TF_GPU_ALLOCATOR=cuda_malloc_async’ will improve the situation” also occurred on console(terminal). Thus, I want to konw how to set the environment variable ‘TF_GPU_ALLOCATOR=cuda_malloc_async’? Is $ sudo gedit ~/.barchrc \n adding “TF_GPU_ALLOCATOR=cuda_malloc_async” on the end of the file?

Any idea on how can I solve this issue? Thanks.


It’s essential to check if the installed package is built with the same JetPack version.
The GPU driver is updated across different versions so it may cause some unexpected issues.

More, please note that swap memory cannot be accessed by GPU.
Since Jetson’s CPU and GPU use the same physical memory, it might lead to errors if the implementation doesn’t manage the integrated memory case.



I’m very happy for your replay and have benefited a lot from your comments!

I used torch and tensorflow to implement deep learning algorithms for classfication and sementic segmentation on Xavier NX a few weeks ago. Despite the above errors, the code can still run normally, but the algorithm’s time-consuming is very long in classifying and segmenting the first image. During the segmentation, the code seems to have been trying to obtain enough memory from Xavier until enough. However, time-consuming is normal starting from the second image both for classfication and segmentation. I really want to know the principle behind this.



Is PyTorch or TensorFlow essential for you?

For Jetson, it’s more recommended to deploy a model with TensorRT.
Since it is our own library, we have optimized the memory usage and performance for the Jetson platform.

You can find a tutorial below:
(It’s expected to convert the PyTorch or TensorFlow into the ONNX format first)



Is must transform a TensorFlow model to a TensorRT one on the platform which the model be trained on?


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.