Decode_rotate.cu missing file - retinanet-examples installation

Hi guys,

I’m trying to install this NVIDIA repo on Jetson TX2 - Jetpack4.3 using this [link](link: GitHub - NVIDIA/retinanet-examples: Fast and accurate object detection with end-to-end GPU optimization).

After the install of all requirement which went well I’m facing this issue after typing the command “python setup.py install”:

/usr/local/cuda-10.0/bin/nvcc -I/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/include -I/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/include/TH -I/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/nvidia/.virtualenvs/cv/include -I/usr/include/python3.6m -c csrc/cuda/decode_rotate.cu -o build/temp.linux-aarch64-3.6/csrc/cuda/decode_rotate.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++14 --expt-extended-lambda --use_fast_math -Xcompiler -Wall -gencode=arch=compute_62,code=sm_62 -gencode=arch=compute_62,code=compute_62 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1
gcc: error: csrc/cuda/decode_rotate.cu: No such file or directory
gcc: warning: ‘-x c++’ after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
error: command '/usr/local/cuda-10.0/bin/nvcc' failed with exit status 1

I can’t find anywhere why the “decode_rotate.cu” file is missing, I have CUDA 10.0 correctly installed as far as I can tell.

Any idea is welcomed as I’m struggling with this issue for a week

Thanks

Hi,

The decode_rotate.cu is included in the retinanet-examples:
https://github.com/NVIDIA/retinanet-examples/blob/master/csrc/cuda/decode_rotate.cu

Would you mind to check if you clone the example completely first?

$ git clone https://github.com/nvidia/retinanet-examples

Thanks.

Hi,
actually as I’m using Jetpack4.3 it is recommanded to clone the 19.10 branch.

Indeed the files are missing in the branch 19.10, should I use the master or is it totally incompatible with Jetpack4.3 ?

Thanks

Hey,
I’ve finally been able to install the 20.03 branch correctly but when I want to call “retinanet” command I have this error occuring:

 File "/home/nvidia/.virtualenvs/cv/bin/retinanet", line 33, in <module>
    sys.exit(load_entry_point('retinanet==0.2.3', 'console_scripts', 'retinanet')())
  File "/home/nvidia/.virtualenvs/cv/bin/retinanet", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/importlib_metadata/__init__.py", line 105, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/retinanet-0.2.3-py3.6-linux-aarch64.egg/retinanet/main.py", line 10, in <module>
    from retinanet import infer, train, utils
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/retinanet-0.2.3-py3.6-linux-aarch64.egg/retinanet/infer.py", line 12, in <module>
    from .dali import DaliDataIterator
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/retinanet-0.2.3-py3.6-linux-aarch64.egg/retinanet/dali.py", line 5, in <module>
    from nvidia.dali import pipeline, ops, types
ModuleNotFoundError: No module named 'nvidia'

I found some package to install “nvidia-dali-cuda100” but it seems that I can’t install it through “pip” command.
Here is the error after typing “pip install nvidia-dali-cuda100”:

ERROR: Command errored out with exit status 1:
     command: /home/nvidia/.virtualenvs/cv/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-lit2rmf_/nvidia-dali-cuda100/setup.py'"'"'; __file__='"'"'/tmp/pip-install-lit2rmf_/nvidia-dali-cuda100/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-cbuvqo74
         cwd: /tmp/pip-install-lit2rmf_/nvidia-dali-cuda100/
    Complete output (18 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-lit2rmf_/nvidia-dali-cuda100/setup.py", line 150, in <module>
        raise RuntimeError(open("ERROR.txt", "r").read())
    RuntimeError:
    ###########################################################################################
    The package you are trying to install is only a placeholder project on PyPI.org repository.
    This package is hosted on NVIDIA Python Package Index.

    This package can be installed as:
    ```
    $ pip install nvidia-pyindex
    $ pip install nvidia-dali-cuda100
    ```

    Please refer to NVIDIA DALI installation guide for instructions:
    https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html
    ###########################################################################################
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I have installed nvidia-pyindex but I don’t get how to get dali from this package…

Hi,

Would you mind to check if this comment helps?
https://github.com/NVIDIA/retinanet-examples/issues/37#issuecomment-501833186

Thanks.

Hey,
I have already tested this procedure. From what the author of retinanet-examples is telling me, it seems there is no package dali for Jetpack…

So now I’m looking for a way to delete the need of dali for the retinanet command…not sure if I can make it work

Hi,

Based on the below comment, you can try to build it from source to install DALI on Jetson:
https://github.com/NVIDIA/DALI/issues/1864

The building steps can be found here:
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/compilation.html

Thanks.

Hey,
I found a way to delete the need of dali, but I’m facing an issue with pytorch.

I have used the pre-built wheel for Jetpack4.3 to install pytorch 1.4 but when I call the retinanet command I have this occuring:

 File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 480, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in

Do you have any idea how to solve this ?

Thanks

Hi,

NCCL only support desktop user.
It cannot be used on the integrated GPU like Jetson.

It seems that you will need to use 19.10 branch for Jeston environment.
Would you mind to give it a try.

Thanks.

Hey,

I can give a new try to this version but it will take me back to the very beginning of this topic…

I will lack the different file (decode_rotate.cu and other) and I’m not sure if just inserting them in the right directory before compiling will do the work…

what do you think ?

Hey,

I tried to compile the 19.10 branch by modifying the “csrc/” repo (adding the cuda file missing), it works well but when I call “retinanet infer” I still get the NCCL issue.

I found that it could be linked to pytorch, and from this link they propose a modified distributed_c10d.py which I inserted in my torch repo.

But retrying the command to infer and i’m facing this issue:

File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/retinanet-0.2.3-py3.6-linux-aarch64.egg/retinanet/main.py", line 6, in <module>
    import torch.cuda
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/__init__.py", line 280, in <module>
    from .functional import *
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/functional.py", line 2, in <module>
    import torch.nn.functional as F
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/nn/__init__.py", line 3, in <module>
    from .parallel import DataParallel
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/nn/parallel/__init__.py", line 5, in <module>
    from .distributed import DistributedDataParallel
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 8, in <module>
    import torch.distributed as dist
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/distributed/__init__.py", line 15, in <module>
    from .distributed_c10d import *
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 11, in <module>
    from . import (
ImportError: cannot import name 'AllToAllOptions'

So I commented the “AllToAllOptions” line in the distributed_c10d.py file but retrying has given me another error:

Traceback (most recent call last):
  File "/home/nvidia/.virtualenvs/cv/bin/retinanet", line 33, in <module>
    sys.exit(load_entry_point('retinanet==0.2.3', 'console_scripts', 'retinanet')())
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/retinanet-0.2.3-py3.6-linux-aarch64.egg/retinanet/main.py", line 181, in main
    worker(0, args, 1, model, state)
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/retinanet-0.2.3-py3.6-linux-aarch64.egg/retinanet/main.py", line 116, in worker
    torch.distributed.init_process_group(backend='nccl', init_method='env://')
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 420, in init_process_group
    init_method, rank, world_size, timeout=timeout
  File "/home/nvidia/.virtualenvs/cv/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 74, in rendezvous
    return _rendezvous_handlers[result.scheme](url, **kwargs)
TypeError: _env_rendezvous_handler() got an unexpected keyword argument 'timeout'

It seems really complicated to compile this repo on a Jetson…

Thanks for your help

Hi,
DALI doesn’t provide a prebuild python wheel for Jetson, still, you can build it on your own, please check this GitHub thread Not able to install DALI on Jetson NX Cuda 10.2 · Issue #2507 · NVIDIA/DALI · GitHub.