CUDA error running segmentation_clara_ct_lung Demo

Things I’ve done:

  • Pull the trainv2.0 docker image

  • use the following script to lunch the container

    sudo docker run
    –runtime=nvidia
    –shm-size=1G
    –ulimit memlock=-1
    –ulimit stack=67108864
    -it --rm
    -v
    /home/myuserename:/workspace/home
    nvcr.io/nvidia/clara-train-sdk:v2.0 /bin/bash

  • Download the MMAR demo, extract it from archive and run train.sh

It first gave an error saying “medical.tlt2” not found. I changed the $PYTHONPATH to a Deploy SDK folder with tlt2 package.

Then it gave a confusing error, which is something like this:

2020-03-19 09:27:10.393142: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
Traceback (most recent call last):
File “/usr/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “/usr/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “apps/train.py”, line 71, in
File “apps/train.py”, line 61, in main
File “workflows/workflow_factory.py”, line 29, in create_trainer
File “workflows/workflow_factory.py”, line 29, in
File “workflows/workflow_factory.py”, line 191, in build_component
File “utils/compo_module_names.py”, line 14, in init
File “utils/compo_module_names.py”, line 25, in _create_classes_table
File “/usr/lib/python3.6/importlib/init.py”, line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 994, in _gcd_import
File “”, line 971, in _find_and_load
File “”, line 955, in _find_and_load_unlocked
File “”, line 665, in _load_unlocked
File “”, line 678, in exec_module
File “”, line 219, in _call_with_frames_removed
File “components/transforms/transforms.py”, line 3, in
File “tlt2/src/components/transforms/libs/transforms.py”, line 25, in
File “tlt2/src/components/transforms/libs/cupyhelper.py”, line 61, in init
File “cupy/cuda/function.pyx”, line 178, in cupy.cuda.function.Module.load_file
File “cupy/cuda/function.pyx”, line 182, in cupy.cuda.function.Module.load_file
File “cupy/cuda/driver.pyx”, line 177, in cupy.cuda.driver.moduleLoad
File “cupy/cuda/driver.pyx”, line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_FILE_NOT_FOUND: file not found

Since the VM is built on GCP’s Nvidia HGC Image on market place and I have tested the MNIST demo successfully, I don’t understand why a CUDA error still pops out. Besides, the output did say a CUDA library has been successfully loaded.

Nvcc —version gives (with a Nvidia P100) a version of 440 which satisfies the requirements.

Hi,

This board is specifically for AIAA-related problems.
Please post your question regarding training side in here: https://forums.developer.nvidia.com/c/healthcare/clara-train-transfer-learning-toolkit-for-medi/154

You can also find similar questions/answers there.

Thanks