NVIDIA-TAO-Deploy -pycocotools-fix issue - Python Wheels TAO implementation porting from Collab to AWS SageMaker Studio

riyang1 · November 27, 2023, 11:31am

Please provide the following information when requesting support.

• Hardware (T4 on AWS Sagemaker Studio)
• Network Type (LPR/LPDNet - )
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I want to trial pre-trained LPD/LPR models as first steps in trialling NVIDIA software for our client’s problems. However faced with some issues. I am trying to set Tao-Toolkit up with python wheels on AWS Sagemaker Studio environment, but met with issues, specifically when installing nvidia-tao-deploy==4.0.0.1. I am trying ton AWS as in other forums posts, there is still no fix to the google collab issue regarding Ubuntu version.

To Reproduce:

Resources i used

Tao Toolkit Starter guide - under “Python Wheels” option
TAO Toolkit Quick Start Guide - NVIDIA Docs
Python Wheels leads Documentation about running TAO Toolkit on Google Colab:
1. Running TAO Toolkit on Google Colab - NVIDIA Docs
2. It is assumed that if it is able to work on Colab, it should work on AWS Sagemaker Studio
Pre-trained Model for Inference Only
1. Out of All Notebooks found on the above page, we focus attention on this one as it pertains to Inference only with no training
  a. Running TAO Toolkit on Google Colab - NVIDIA Docs
Collab Notebook for Inference Only
1. Google Colab
2. This is the notebook that NVIDIA provided to set-up Tao Toolkit using python wheels on Google Collab
3. This is the one copied into AWS SageMaker Studio environment.
TAO Dependencies
Release Notes - NVIDIA Docs
1. My understanding is that we need tao 4.0.1
  a. Why?
  i. In the setup_env_colab.sh file, all the pip install commands have “python3.8”
  ii. Eg:
  1)
  iii. And also, the tao-deploy version is 4……

1. AWS Environment

IN AWS Sagemaker studio, there are a limited selection of images available.

option 1: ml.g4dn.xlarge
TensorFlow 2.12.0 Python 3.10 GPU Optimized The AWS Deep Learning Containers for TensorFlow 2.12.0 with CUDA 11.8 include containers for training on GPU, optimized for performance and scale on AWS. For more information, see Release Notes for Deep Learning Containers. Ubuntu version: 20.04

Notes:
- The image has python 3.10, but the required Python version is 3.8
To cater for this, we will create a conda environment with python 3.8.

In this environment, I ran through the Tao Deploy notebook - Google Colab. Copied exactly what is required, including manualyl DL and untarring TensorRT.

however, when it came to installing dependencies - I ran into issues. Probably due to the image being Python 3.10 and the dependencies + tao toolkit requiring 3.8. I then created a conda environment with py38 to try to remedy this.

2. Create Conda Environment
Launched Terminal in current sagemaker image via this button in notebook:

Install miniconda:
https://docs.conda.io/projects/miniconda/en/latest/index.html#quick-command-line-install

mkdir -p ~/miniconda3

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

Initialise newly-installed Miniconda:
- ~/miniconda3/bin/conda init bash

Switch to bash:
- bash

Create and activate conda env:
- conda create -n py38_ry_test python=3.8
- conda activate py38_ry_test

Install dependencies
- pip install --upgrade pip
- pip install cython
- pip install nvidia-ml-py
- pip install nvidia-pyindex
- pip install --upgrade setuptools
- conda deactivate

Deactivate conda - THIS WORKED ONE TIME last week… Stated requirement already satisfied… but not this time.
- pip install --pycuda==2020.1 ***********FAIL -
- conda activate py38_ry_test
pip install --pycuda==2020.1 ***********FAIL

3. Trying another image with CUDA 11.8
I then tried another image, this one having pytorch 2.0.0 with cuda 11.8:

PyTorch 2.0.0 Python 3.10 GPU Optimized The AWS Deep Learning Containers for PyTorch 2.0.0 with CUDA 11.8 include containers for training on GPU, optimized for performance and scale on AWS. For more information, see Release Notes for Deep Learning Containers. pytorch-2.0.0-gpu-py310 Python 3 Python 3.10

This fixed it - i could install pycuda==2020.1.

I then installed TensorRT like so:
pip install tensorrt==8.5.1.7

This worked.

4. Tao Deploy == 4.0.0.1 Issue
However, I am now met with another issue, the one I am currently stuck on…

pip install nvidia-tao-deploy==4.0.0.1**** FAIL

I will spare my attempts to bug fix this… I have tried everything online, to do with cython, pycocotools, etc… i just can’t get this to work.

Can someone please provide guidance on how to fix this?

Morganh · November 28, 2023, 2:50am

May I confirm that in your latest triage, you are using below image from Available Amazon SageMaker Images - Amazon SageMaker, right?

More, for

Could you please upload the full log as a txt file when you run pip install nvidia-tao-deploy==4.0.0.1? Thanks a lot.

riyang1 · November 28, 2023, 11:10pm

Yes that image is the one used.

Uploaded txt.file below
pycocotools_error_tao.txt (21.7 KB)

Morganh · November 29, 2023, 2:00am

Thanks. Could you please help to check if it is Ubuntu 20.04?

riyang1 · November 29, 2023, 2:04am

yep it is!

Morganh · November 29, 2023, 5:03am

For above error, could you run as below? Refer to pycocotools/_mask.c:547:21: fatal error: maskApi.h: No such file or directory · Issue #141 · cocodataset/cocoapi · GitHub
$ pip install Cython

riyang1 · November 29, 2023, 5:56am

Cython is already installed. I’ve tried all the solutions in that post.

Morganh · November 29, 2023, 8:58am

Please run following commands and retry.

   $  pip install Cython==0.29.36
   $  sudo apt install libopenmpi-dev
   $  pip install mpi4py
   $  pip install nvidia-tao-deploy==4.0.0.1

BTW, I mimic the aws container with below ways and can reproduce the errors for “maskApi.c” and fix it with above commands.


$ docker run --runtime=nvidia -it nvidia/cuda:11.8.0-devel-ubuntu20.04 /bin/bash
$ apt update
$ apt-get install sudo
$ apt install build-essential libbz2-dev libdb-dev   libreadline-dev libffi-dev libgdbm-dev liblzma-dev   libncursesw5-dev libsqlite3-dev libssl-dev   zlib1g-dev uuid-dev tk-dev wget liblapack-dev   graphviz fonts-humor-sans git  -y
$ export VER=3.10.10
$ wget "https://www.python.org/ftp/python/$VER/Python-$VER.tgz"     && tar -xzvf Python-$VER.tgz     && cd Python-$VER     && ./configure --enable-optimizations --with-lto     && make     && make install
$ cp /Python-3.10.10/python /usr/local/bin/
$ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
$ mkdir -p ~/miniconda3
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
$ ~/miniconda3/bin/conda init bash
$ sudo cp /root/miniconda3/bin/conda /usr/bin/conda
$ conda create -n launcher python=3.8
$ conda activate launcher
$ pip install --upgrade pip
$ pip install cython
$ pip install nvidia-ml-py
$ pip install nvidia-pyindex
$ pip install --upgrade setuptools
$ pip install pycuda==2020.1
$ pip install https://files.pythonhosted.org/packages/d1/c2/c14dd8884a5bc05ca07331b3d78a92812eb19e25a625a0b59af8b609a93f/nvidia_eff_tao_encryption-0.1.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
$ pip install https://files.pythonhosted.org/packages/cf/ec/47f770919111bcd7047e463389e7f763afbc6ae7b96cbd4be974342a5bb1/nvidia_eff-0.6.2-py38-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
$ pip install cffi
$ pip install nvidia-tao-deploy==4.0.0.1

riyang1 · November 29, 2023, 1:30pm

Thank you! This did it for me. I did as suggested. Cython==0.29.36 worked.

I then added my conda environment using:
pip install ipykernel
python -m ipykernel install --user --name env_name --display-name “name of your choosing.”

rebooted the notebook and selected the conda env as kernel.

I pip installed everything again including the ones listed above by yourself @Morganh.
making sure to install this numpy:
!pip install numpy==1.23.4

A few more tweaks to the boilerplate code and i got outputs!

Hoping to trial LPDNet + Vehicles ones soon. Thanks so much.

system · December 13, 2023, 1:31pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
LicensePlateDetection - subprocess-exited-with-error TAO Toolkit cuda	7	563	April 5, 2024
Can't run the provided TAO toolkit sample code TAO Toolkit	20	2094	July 31, 2023
Running tao toolkit in google colab TAO Toolkit tao	14	2039	July 9, 2023
Running TAO Toolkit on Google Colab TAO Toolkit	8	699	February 22, 2024
How to properly install tao-deploy ? bash: tao-deploy: command not found TAO Toolkit tao	17	1947	March 21, 2023
Mismatch in python environment on AWS EC2 image TAO Toolkit	8	835	September 27, 2023
TAO Toolkit Google colab Error TAO Toolkit	7	851	February 22, 2024
Nvidia Tao Toolkit Easy to use cloude recommendation TAO Toolkit cuda , tao	5	67	October 10, 2024
TAO container fails on Google Vertex AI TAO Toolkit	69	2676	October 27, 2021
TAO Toolkit Container(for Conversational AI) Setup Issue TAO Toolkit	12	653	November 30, 2022

NVIDIA-TAO-Deploy -pycocotools-fix issue - Python Wheels TAO implementation porting from Collab to AWS SageMaker Studio

pip install nvidia-tao-deploy==4.0.0.1**** FAIL

Related topics