Hi, I’m trying to evaulate my own model which is trained with TAO-Toolkit
However, numba error occurs…
I know that TAO Toolkit just uses docker environment, I didn’t touch other dependencies…
Anyone have same error like this?
Sysetm information
• Hardware: RTX3090
• Network Type: PointPillar
Here’s my results of tao info
Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit’]
format_version: 2.0
toolkit_version: 4.0.1
published_date: 03/06/2023
When I evaluate my network by tao pointpillars evaluate -e $SPECS_DIR/pointpillars_cm.yaml -r $USER_EXPERIMENT_DIR -k $KEY
, following error occurs
(launcher) ailab@3090-4:~/Project/04_HMG_AVC/TAO-PointPillars/pointpillars$ tao pointpillars evaluate -e $SPECS_DIR/pointpillars_cm.yaml -r $USER_EXPERIMENT_DIR -k $KEY
2023-05-31 13:24:01,936 [INFO] root: Registry: [‘nvcr.io’]
2023-05-31 13:24:01,982 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
2023-05-31 13:24:02,006 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ailab/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
python /opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/evaluate.py --cfg_file /workspace/tao-experiments/specs/pointpillars_cm.yaml --output_dir /workspace/tao-experiments/pointpillars --key tlt_encode
2023-05-31 04:24:06,159 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: Start logging
2023-05-31 04:24:06,159 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: CUDA_VISIBLE_DEVICES=2
2023-05-31 04:24:08,532 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: Loading point cloud dataset
2023-05-31 04:24:08,576 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: Total samples for point cloud dataset: 2549
2023-05-31 04:24:08,576 [WARNING] root: ‘decrypt_stream’ is deprecated, to be removed in ‘0.7’. Please use ‘eff.codec.decrypt_stream()’ instead.
2023-05-31 04:24:10,273 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: *************** EPOCH 36 EVALUATION *****************
eval: 100%|████████████████████████| 2549/2549 [02:25<00:00, 17.56it/s, recall_0.3=(0, 28814) / 43040]
2023-05-31 04:26:35,438 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: *************** Performance of EPOCH 36 *****************
2023-05-31 04:26:35,438 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: Generate label finished(sec_per_example: 0.0569 second).
2023-05-31 04:26:35,439 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: recall_roi_0.3: 0.000000
2023-05-31 04:26:35,439 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: recall_rcnn_0.3: 0.669470
2023-05-31 04:26:35,439 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: recall_roi_0.5: 0.000000
2023-05-31 04:26:35,440 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: recall_rcnn_0.5: 0.661640
2023-05-31 04:26:35,440 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: recall_roi_0.7: 0.000000
2023-05-31 04:26:35,440 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: recall_rcnn_0.7: 0.614986
2023-05-31 04:26:35,449 [INFO] nvidia_tao_pytorch.pointcloud.pointpillars.pcdet.utils.common_utils: Average predicted number of objects(2549 samples): 13.352
2023-05-31 04:26:37,331 [INFO] numba.cuda.cudadrv.driver: init
2023-05-31 04:26:37,618 [ERROR] numba.cuda.cudadrv.driver: Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 2160, in add_ptx
driver.cuLinkAddData(self.handle, enums.CU_JIT_INPUT_PTX,
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 300, in safe_cuda_api_call
self._check_error(fname, retcode)
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 335, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERRORDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “</opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/evaluate.py>”, line 3, in
File “”, line 231, in
File “”, line 223, in main
File “”, line 71, in eval_single_ckpt
File “”, line 114, in eval_one_epoch
File “”, line 281, in evaluation
File “</opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/pcdet/datasets/kitti/kitti_object_eval_python/eval.py>”, line 1, in
File “”, line 7, in
File “</opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/pcdet/datasets/kitti/kitti_object_eval_python/rotate_iou.py>”, line 1, in
File “”, line 304, in
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/decorators.py”, line 95, in kernel_jit
return Dispatcher(func, [func_or_sig], targetoptions=targetoptions)
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/compiler.py”, line 899, in init
self.compile(sigs[0])
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/compiler.py”, line 1102, in compile
kernel.bind()
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/compiler.py”, line 590, in bind
self._func.get()
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/compiler.py”, line 441, in get
linker.add_ptx(ptx)
File “/opt/conda/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 2163, in add_ptx
raise LinkerError(“%s\n%s” % (e, self.error_log))
numba.cuda.cudadrv.driver.LinkerError: [222] Call to cuLinkAddData results in UNKNOWN_CUDA_ERROR
ptxas application ptx input, line 9; fatal : Unsupported .version 7.8; current version is ‘7.4’PyCUDA ERROR: The context stack was not empty upon module cleanup.
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.