Tensorflow doesn't use GPU on Jetson Xavier AGX

I lost any hope actually.
I’m trying to run some python api on Jetson Xavier AGX, however tensorflow is running only on cpu… I tried probably every solution from similar topics, however I cannot install any tensorflow version, which would use gpu…

I use python 3.7 version, CUDA 11.8. Each try of installing tensorflow end with install tensorflow-cpu-aws and tensorflow don’t even detect any GPU in system.

No version of tensorflow-gpu is avalaible, when i try install it with pip.
I tried to install tensorflow from https://developer.nvidia.com/embedded/downloads, but ‘sess.list_devices()’ still prints:

[_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, -7305406428235014482)]

My Jetson info:

NVIDIA Jetson-AGX
L4T 32.6.1 [ JetPack 4.6 ]
Ubuntu 18.04.6 LTS
Kernel Version: 4.9.253-tegra
CUDA 11.8.89
CUDA Architecture: NONE
OpenCV version: 4.1.1
OpenCV Cuda: NO
CUDNN: 8.2.1.32
TensorRT: NOT_INSTALLED
Vision Works: NOT_INSTALLED
VPI: NOT_INSTALLED
Vulcan: 1.2.70

Can anyone help me with this issue?

I actually flashed jetson twice…
Installed newest jetpack 5.0.2 with CUDA 11.8.
nothing changes, tensorflow don’t see GPU, drivers not visible

cat: /proc/driver/nvidia/version: No such file or directory

I’ve exhausted probably every option, is there any person can help me actually or should i drop the xavier to trash?

Hi,

I’m going to move this over to the Jetson category for better visibility.

1 Like

I thought if using jetpack 5.0.2 the installed cuda version should be 11.4, so I wonder how you installed 11.8 on Jetson.

Did you followed this link?

Hi,

We don’t have prebuilt for python3.7.
Is python3.6 (JetPack4.6.2) or python3.8 (JetPack5.0.2) an option for you?

For JetPack 5.0.2, the CUDA version is 11.4.
Please install TensorFlow with the following document.

Please noted that you will need to specify the package version to avoid the CPU package being downloaded.
For example:

$ sudo pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v502 tensorflow==2.10.0+nv22.11

Thanks.

Indeed, i have no idea how i installed cuda 11.8 for Jetpack 4.6. However i flashed jetson One more time. Now It’s python 3.8 and cuda 11.4 and I tried to install tensorflow from source provided by you, However with no result… It still detects only CPU :( I tried also with version 2.11, but according to nvidia documentation, 2.10 is compatibile, however It also didnt work for me.

I tried to install it dozen of times, with different versions, platform, from different sources and nothing changed.
I also tried exactly version mentioned by you, but with no result actually.

Hi,

Thanks for testing this.

Would you mind helping us to verify the GPU functionality and share the output log with us?
Here is the detailed step:

$ cd /usr/local/cuda-11.4/samples/1_Utilities/deviceQuery
$ sudo make
$ ./deviceQuery 

Thanks

Here is the result;

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Xavier"
  CUDA Driver Version / Runtime Version          11.4 / 11.4
  CUDA Capability Major/Minor version number:    7.2
  Total amount of global memory:                 14907 MBytes (15631331328 bytes)
  (008) Multiprocessors, (064) CUDA Cores/MP:    512 CUDA Cores
  GPU Max Clock rate:                            1377 MHz (1.38 GHz)
  Memory Clock rate:                             1377 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.