Jetson AI Lab - ML DevOps, Containers, Core Inferencing

4/2/24 - Pip wheel server cache

  • In association with the points above, the number of container combinations increases exponentially, impacting the scalability of the build system to automatically redistribute binaries so not every user is spending hours/days recompiling these packages which are often tricky to build correctly.

  • jetson-containers now caches the pip wheels that it builds on a custom Pip server, which is used not only to install these packages into the deployment containers, but can be used by any Jetson user to install these packages natively even outside of container.

  • A prototype version of this pip server is running at http://jetson.webredirect.org/ , with wheels available for multiple CUDA versions dating back to JetPack 4.6 and CUDA 10.2. This index is automatically populated by the build farm of Jetson’s that I run locally.

  • You can have pip install these CUDA-enabled packages by setting --index-url or $PIP_INDEX_URL with your desired CUDA version (or by setting it persistently in your user’s pip.conf file). And temporarily --trusted-host or $PIP_TRUSTED_HOST also need set:

    export PIP_INDEX_URL=http://jetson.webredirect.org/jp6/cu122
    export PIP_TRUSTED_HOST=jetson.webredirect.org
    
    pip3 install pytorch torchvision torchaudio  # no more compiling of torchvision/torchaudio needed :)
    pip3 install transformers   # the correct PyTorch (with CUDA) will automatically be installed
    
  • This custom pip server mirrors the upstream PyPi server, so packages that aren’t in it will automatically be pulled from PyPi. However it shadows packages that jetson-containers builds with CUDA, so that when installing packages that depend on these CUDA-enabled packages (like how Transformers depends on PyTorch, but Transformers itself doesn’t require CUDA compilation) the correct version of that CUDA-enable package is installed from our Jetson-specific index.

  • When using anything that uses PyTorch, run sudo apt-get install libopenblas-dev libopenmpi-dev first, because the PyTorch wheels are built with USE_DISTRIBUTED=on (so that Jetson is able to run upstream PyTorch code that references the torch.distributed module, which is a common occurrence with open-source AI/ML projects even when running only one Jetson ‘node’)

2 Likes