Building Tensorflow on AGX

pola5392 · February 17, 2021, 5:07pm

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.4.0.7363
other

Host Machine Version
native Ubuntu 18.04
other

Hello,
First, I would like to mention that I am not sure wether this issue belongs to Tensorflow or Drive AGX but I believe that it is appropriate to seek a solution here. I have seen a couple of similar issue encounters on various forums but none has been able to solve my problem. I am also aware that Tensorflow isn’t officially supported for the AGX.

I am having a problem building Tensorflow from source onto my AGX. I am attempting to build Tensorflow 2.4 . I have Bazel 3.7.2 installed and working well. I am able to run the configuration file inside of tensorflow but my issue does not come up until I attempt to build Tensorflow with CUDA supports. I was able to properly configure the installation without CUDA support. This is my output:

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]: 10.2


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.6


Please specify the TensorRT version you want to use. [Leave empty to default to TensorRT 6]: 6.3


Please specify the locally installed NCCL version you want to use. [Leave empty to use http://github.com/nvidia/nccl]: 


Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]: /usr/include/linux/


Traceback (most recent call last):
  File "third_party/gpus/find_cuda_config.py", line 653, in <module>
    main()
  File "third_party/gpus/find_cuda_config.py", line 645, in main
    for key, value in sorted(find_cuda_config().items()):
  File "third_party/gpus/find_cuda_config.py", line 583, in find_cuda_config
    result.update(_find_cuda_config(cuda_paths, cuda_version))
  File "third_party/gpus/find_cuda_config.py", line 257, in _find_cuda_config
    get_header_version)
  File "third_party/gpus/find_cuda_config.py", line 244, in _find_header
    required_version, get_version)
  File "third_party/gpus/find_cuda_config.py", line 233, in _find_versioned_file
    actual_version = get_version(file)
  File "third_party/gpus/find_cuda_config.py", line 250, in get_header_version
    version = int(_get_header_version(path, "CUDA_VERSION"))
ValueError: invalid literal for int() with base 10: ''
Asking for detailed CUDA configuration...

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]:

This might also be an issue with Python I think. I tried going ahead and hardcoding the configuration file to force it into accepting the cuda.h file found but I don’t think that method was very effective.

Has anyone encountered an issue like this before? Does anyone know how to fix this? Any advice would be greatly appreciated. Thank you,

Zeus

VickNV · February 17, 2021, 7:42pm

Hi @pola5392 ,

I haven’t tried to build tensorflow from source so I’m not familiar with this topic.
But does the list of base paths specified by you (/usr/include/linux/) include information like “define CUDA_VERSION 10020”? Does ’ ’ in the error message stand for empty?

pola5392 · February 17, 2021, 10:02pm

Hey Vick,

So, the path leads to cuda.h and the configuration file attempts to read the version (I believe) from that file

version = int(_get_header_version(path, "CUDA_VERSION"))

I’m not sure what ‘’ in the error message means. It’s possibly empty, meaning that the script wasn’t able to get any information from cuda.h file. This is more context:

def _find_cuda_config(base_paths, required_version):

  def get_header_version(path):
    version = int(_get_header_version(path, "CUDA_VERSION"))
    if not version:
      return None
    return "%d.%d" % (version // 1000, version % 1000 // 10)

  cuda_header_path, header_version = _find_header(base_paths, "cuda.h",
                                                  required_version,
                                                  get_header_version)
  cuda_version = header_version  # x.y, see above.

  cuda_library_path = _find_library(base_paths, "cudart", cuda_version)

Please let me know what you think. Do you think I might be pointing to the wrong file?
Zeus

VickNV · February 17, 2021, 10:26pm

Does cuda.h exist in your base paths, /usr/include/linux/?
Have you searched internet if people encountered and solved the same issue?