Unable to load model in TAO Toolkit on GCP

Hello,
I am trying to run BPNet on 1 T4 GPU on GCP using the bpnet.ipynb notebook provided. I followed the instructions to run tao on gcp and changed the ~/.tao_mounts.json file to the required paths. When I try to load the model for inference on the COCO validation dataset using the command

Single-scale inference

!tao bpnet evaluate --inference_spec $SPECS_DIR/infer_spec.yaml
–dataset_spec $DATA_POSE_SPECS_DIR/coco_spec.json
–results_dir $USER_EXPERIMENT_DIR/results/exp_m1_unpruned/eval_default
-k $KEY

it throws the following error:

2022-04-06 16:23:43,947 [INFO] root: Registry: [‘nvcr.io’]
2022-04-06 16:23:44,007 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-04-06 16:23:45.192092: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-04-06 16:23:48,796 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-04-06 16:23:52,798 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

/usr/local/lib/python3.6/dist-packages/driveix/common/utilities/path_processing.py:74: YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
/workspace/tao-experiments/bpnet/results/exp_m1_unpruned/eval_default
WARNING 2022-04-06 16:23:52,924 | main: No model provided! Using model_path from inference spec file.
INFO 2022-04-06 16:23:52,925 | main: Loading /workspace/tao-experiments/bpnet/models/model.etlt for evaluation
loading annotations into memory…
Done (t=9.93s)
creating index…
index created!
loading annotations into memory…
Done (t=0.25s)
creating index…
index created!
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/evaluate.py”, line 153, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/evaluate.py”, line 145, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataio/coco_dataset.py”, line 456, in infer
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/inferencer/bpnet_inferencer.py”, line 64, in init
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py”, line 185, in model_io
AssertionError: Model not found at /workspace/tao-experiments/bpnet/models/model.etlt
Traceback (most recent call last):
File “/usr/local/bin/bpnet”, line 8, in
sys.exit(main())
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py”, line 12, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py”, line 300, in launch_job
AssertionError: Process run failed.
2022-04-06 16:24:06,802 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I started a tao-tf container to make sure the model path exists, but the error persists. How can I solve this error?
Thank you for your help!

Usually, this kind of error results from mapping.
Please double check your ~/.tao_mounts.json.

Refer to TAO Toolkit Launcher — TAO Toolkit 3.22.05 documentation

Thank you for the pointer. Based on the link, I edited my ~/.tao_mounts.json file to add the path of the models downloaded for BPNet (deployable_v1.0.1), but I keep getting the same error. Can you point me to how I can edit the file correctly, please? I have attached the contents of the file below:

{
“Mounts”: [
{
“source”: “/home/user/cv_samples_v1.2.0/”,
“destination”: “/workspace/tao-experiments”
},
{
“source”: “/home/user/cv_samples_v1.2.0/bpnet/pretrained_model”,
“destination”: “/workspace/tao-experiments/bpnet/models”

},
{
“source”: “/home/user/cv_samples_v1.2.0/bpnet/specs”,
“destination”: “/workspace/examples/bpnet/specs”
},
{
“source”: “/home/user/cv_samples_v1.2.0/bpnet/data_pose_config”,
“destination”: “/workspace/examples/bpnet/data_pose_config”
},
{
“source”: “/home/user/cv_samples_v1.2.0/bpnet/model_pose_config”,
“destination”: “/workspace/examples/bpnet/model_pose_config”
}
],
“DockerOptions”: {
“user”: “0:0”
}

Thanks once again for your help!

Please check this parameter? Is it expected?

I added the “user”: “0:0” as specified in the previous error log to add “user”:“UID:GID” using the commands id -u and id -g… Should the model source and destination paths be included in the file? I am referring to different examples, but none have added the model paths in the tao_mounts.json file.

To narrow down, could you login the docker directly to debug?
You can run following command in terminal.

$ tao bpnet run /bin/bash

Then,
# ls /workspace/tao-experiments/

I was installing the DeepStream SDK following the link . While building deepstream-bodypose2d-app, I am getting the following error:
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiCopy_8u_C3R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiConvert_32f8u_C3R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiDivC_32f_C3IR_Ctx@libnppial.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiCopy_32f_C3R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiCopy_32f_C1MR_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiAddC_32f_C1IR_Ctx@libnppial.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiDup_32f_C1C3R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSub_32f_C3IR_Ctx@libnppial.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSet_8u_C3R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSet_16u_C1R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiDup_16u_C1C3R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiResize_32f_C1R_Ctx@libnppig.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiResizeSqrPixel_16u_C1R_Ctx@libnppig.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSwapChannels_8u_C3R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiConvert_8u32f_C3R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiCopy_32f_C1R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiColorToGray_8u_C3C1R_Ctx@libnppicc.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiMulC_32f_C1IR_Ctx@libnppial.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiWarpPerspective_8u_C3R_Ctx@libnppig.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiResizeSqrPixel_32f_C1R_Ctx@libnppig.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiConvert_16u32f_C1R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiFilterBoxBorder_16u_C1R_Ctx@libnppif.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiWarpPerspective_32f_C3R_Ctx@libnppig.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiResizeSqrPixel_16u_C3R_Ctx@libnppig.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiWarpPerspective_8u_C1R_Ctx@libnppig.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSet_32f_C3R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSet_8u_C1R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiWarpPerspective_16u_C1R_Ctx@libnppig.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiConvert_32f8u_C1R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSet_16u_C3R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiColorToGray_32f_C3C1R_Ctx@libnppicc.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiDup_8u_C1C3R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiConvert_16u32f_C3R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiResizeSqrPixel_8u_C1R_Ctx@libnppig.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiColorToGray_16u_C3C1R_Ctx@libnppicc.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiCopy_32f_C3MR_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSwapChannels_16u_C3R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiFilterBoxBorder_16u_C3R_Ctx@libnppif.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiCopy_32f_C3P3R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiConvert_8u32f_C1R_Ctx@libnppidei.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSet_32f_C1R_Ctx@libnppidei.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiSub_32f_C1IR_Ctx@libnppial.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiResize_32f_C3R_Ctx@libnppig.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiFilterBoxBorder_8u_C1R_Ctx@libnppif.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiThreshold_LTValGTVal_32f_C3IR_Ctx@libnppitc.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiWarpPerspective_16u_C3R_Ctx@libnppig.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiFilterBoxBorder_32f_C1R_Ctx@libnppif.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiMulC_32f_C3IR_Ctx@libnppial.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiDivC_32f_C1IR_Ctx@libnppial.so.11' /usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to nppiAddC_32f_C3IR_Ctx@libnppial.so.11’
/usr/bin/ld: /opt/nvidia/deepstream/deepstream/lib/cvcore_libs/libnvcv_tensorops.so: undefined reference to `nppiSwapChannels_32f_C3R_Ctx@libnppidei.so.11’

My LD_LIBRARY_PATH is /usr/local/cuda.

Can you please point me to what I might be missing, please?

Please double check the steps in deepstream_tao_apps/apps/tao_others/deepstream-bodypose2d-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.