Cannot run tao unet dataset_convert because of docker mapping issue

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc): Laptop with GPU - GTX 1650
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): Unet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here): toolkit_version: 4.0.1 and 4.0.0-tf1.15.5
• Training spec file(If have, please share here): None
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hello, I am trying to run the Binary Semantic Segmentation using TAO UNET notebook from /unet/tao_isbi, but I have run into docker mapping issues. In section 2 of the notebook, I tried converting my own COCO dataset to unet format by using !tao unet dataset_convert -f [] -r [], but I get no masks output and also get the following error

2023-03-07 18:33:38,138 - root - INFO - Starting Semantic Segmentation Dataset to VOC Convert.
loading annotations into memory…
2023-03-07 18:33:38,138 - root - INFO - Conversion failed with following error: [Errno 2] No such file or directory: ‘./unet/tao_isbi/data/isbi/images/trainval.json’.
Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -5] No address associated with hostname>
Execution status: PASS
2023-03-07 11:33:39,209 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

The path to the json file is correct, but it is not reading it. I figured it is because of the docker mapping, because when I run !cat ~/.tao_mounts.json a few cells above, I only get “user”: “0:0” in “DockerOptions”, which I think is not supposed to happen. So I guess my env variables have been wrongly defined, but I do not understand the file hierarchy. Can I get some help with this please? I attach the env variable definition cell and the mount output here:

First cell:

# Setting up env variables for cleaner command line commands.
import os

%set_env KEY=nvidia_tlt
%set_env GPU_INDEX=0
%env NUM_GPUS=1
# %set_env USER_EXPERIMENT_DIR=/workspace/tao-experiments/unet
# %set_env DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data
# %set_env USER_EXPERIMENT_DIR=/unet/tao_isbi/experiments/unet
# %set_env DATA_DOWNLOAD_DIR=/unet/tao_isbi/experiments/data
%set_env USER_EXPERIMENT_DIR=/home/dari/PycharmProjects/conda/tao-experiments/unet
%set_env DATA_DOWNLOAD_DIR=/home/dari/PycharmProjects/conda/tao-experiments/data

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/unet
# %env NOTEBOOK_ROOT= ./unet/tao_isbi

# Please define this local project directory that needs to be mapped to the TAO docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/unet
# !PLEASE MAKE SURE TO UPDATE THIS PATH!.
# os.environ["LOCAL_PROJECT_DIR"] = './unet/tao_isbi'
os.environ["LOCAL_PROJECT_DIR"] = '/home/dari/PycharmProjects/conda/tao-experiments'

# !PLEASE MAKE SURE TO UPDATE THIS PATH!.
# Point to the 'deps' folder in samples from where you are launching notebook inside unet folder.
# %env PROJECT_DIR=/workspace/iva/ngc-collaterals/cv/samples
%env PROJECT_DIR=/home/dari/PycharmProjects/conda/getting_started_v4.0.0/notebooks/tao_launcher_starter_kit

os.environ["LOCAL_DATA_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "data"
)

os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "unet"
)

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

%set_env SPECS_DIR=/home/dari/PycharmProjects/conda/tao-experiments/unet/specs

!ls -l $LOCAL_DATA_DIR
!ls -rlt $LOCAL_SPECS_DIR

Output:
env: KEY=nvidia_tlt
env: GPU_INDEX=0
env: NUM_GPUS=1
env: USER_EXPERIMENT_DIR=/home/dari/PycharmProjects/conda/tao-experiments/unet
env: DATA_DOWNLOAD_DIR=/home/dari/PycharmProjects/conda/tao-experiments/data
env: PROJECT_DIR=/home/dari/PycharmProjects/conda/getting_started_v4.0.0/notebooks/tao_launcher_starter_kit
env: SPECS_DIR=/home/dari/PycharmProjects/conda/tao-experiments/unet/specs
total 0
total 12
-rw-rw-r-- 1 dari dari 1394 Dec 14 20:37 unet_train_resnet_unet_isbi.txt
-rw-rw-r-- 1 dari dari 1255 Dec 14 20:37 unet_train_resnet_unet_isbi_retrain.txt
-rw-rw-r-- 1 dari dari 1274 Dec 14 20:37 unet_train_resnet_unet_isbi_retrain_qat.txt

Mount cell:

# Mapping up the local directories to the TAO docker.
import json
mounts_file = os.path.expanduser("~/.tao_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
    "Mounts": [
        # Mapping the data directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
            # "destination": "/unet/tao_isbi/"
        },
        # Mapping the specs directory.
        {
            "source": os.environ["LOCAL_SPECS_DIR"],
            "destination": os.environ["SPECS_DIR"]
        },
    ],
    "DockerOptions": {
        # preserving the same permissions with the docker as in host machine.
        "user": "{}:{}".format(os.getuid(), os.getgid())
    }
}

# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(drive_map, mfile, indent=4)

Output with !cat ~/.tao_mounts.json:
{
“Mounts”: [
{
“source”: “/home/dari/PycharmProjects/conda/tao-experiments”,
“destination”: “/workspace/tao-experiments”
},
{
“source”: “/home/dari/PycharmProjects/conda/getting_started_v4.0.0/notebooks/tao_launcher_starter_kit/unet/tao_isbi/specs”,
“destination”: “/home/dari/PycharmProjects/conda/tao-experiments/unet/specs”
}
],
“DockerOptions”: {
“user”: “0:0”
}
}

Thank you in advance

Please check the training spec file. The path should be inside tao container instead of your local path. It is defined in ~/.tao_mounts.json file.

Hello, I think I fixed it. In the first cell I changed the env variables to:

%set_env USER_EXPERIMENT_DIR=/workspace/tao-experiments/unet
%set_env DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data
%set_env SPECS_DIR=/workspace/tao-experiments/unet/specs

and the local project dir to os.environ["LOCAL_PROJECT_DIR"] = './' (tao_isbi folder)

Then for the dataset conversion I ran !tao unet dataset_convert -f /workspace/tao-experiments/data/isbi/images/train/trainval.json -r /workspace/tao-experiments/data/isbi/masks/train and I got the masks! However, they are all black for some reason, so I am not sure if the problem is the trainval.json file, or if there is something I still need to fix with the docker. Here is the output of the command above:

2023-03-08 11:27:27,844 [INFO] root: Registry: [‘nvcr.io’]
2023-03-08 11:27:27,930 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5
Using TensorFlow backend.
2023-03-08 18:27:29.453155: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
2023-03-08 18:27:36,967 - iva.common.logging.logging - INFO - Log file already exists at /workspace/tao-experiments/data/isbi/masks/train/status.json
2023-03-08 18:27:36,967 - root - INFO - Starting Semantic Segmentation Dataset to VOC Convert.
loading annotations into memory…
Done (t=0.04s)
creating index…
index created!
2023-03-08 18:27:37,012 - main - INFO - Number of images that are going to be converted 435
2023-03-08 18:27:37,012 - root - INFO - Number of images that are going to be converted 435
2023-03-08 18:27:45,945 - root - INFO - The total number of skipped annotations are 0
2023-03-08 18:27:45,945 - root - INFO - The total number of skipped images are 0
2023-03-08 18:27:45,945 - root - INFO - The details of faulty annotations and images that were skipped are logged in /workspace/tao-experiments/data/isbi/masks/train/skipped_annotations_log.json
2023-03-08 18:27:45,947 - root - INFO - Conversion finished successfully.
Telemetry data couldn’t be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -5] No address associated with hostname>
Execution status: PASS
2023-03-08 11:27:47,175 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

My data was originally labeled with labelme. I then converted the labels to COCO format using the script attached. Is there perhaps something wrong with the format?
labelme2coco.py (5.1 KB)

Do you mean the mask image is black?

Hi. Sorry for the late reply. I took a closer look at the tao generated masks and noticed that they were not exactly black, but the foreground in the masks were 1 from 0-255 (blackground was 0), so they were barely visible. I then trained the model but got nan results. I “fixed” by mapping the 1 pixels to 128, and started getting more reasonable results.

However, for some reason my predicted masks are rather similar to their corresponding ground truths masks, as if the model was taking information from the test images folder itself (which shouldn’t be the case). I went back to the spec and found out that my foreground had label_id 0 instead of 1 (background was 1). I looked for similar topics on the forum and eventually bumped into this one, which pointed out that their binary segmentation worked with 255 as foreground.

So right now I am not sure what the root of my problem is. The latest model I have trained with my grayscale images was with input masks set to 255 for the foreground (as the user suggested) with label_id 255 for foreground and 0 for background (binary segmentation). Should I change any training parameters? Here is my spec file:


random_seed: 42
model_config {
  model_input_width: 320
  model_input_height: 320
  model_input_channels: 1
  num_layers: 18
  all_projections: true
  arch: "resnet"
  use_batch_norm: False
  training_precision {
    backend_floatx: FLOAT32
  }
}

training_config {
  batch_size: 3
  epochs: 50
  log_summary_steps: 10
  checkpoint_interval: 1
  loss: "cross_dice_sum"
  learning_rate:0.00001
  regularizer {
    type: L2
    weight: 2e-5
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  visualizer {
    enabled: true
  }
}

dataset_config {
  dataset: "custom"
  augment: False
  augmentation_config {
    spatial_augmentation {
    hflip_probability : 0.5
    vflip_probability : 0.5
    crop_and_resize_prob : 0.5
  }
  brightness_augmentation {
    delta: 0.2
  }
}
input_image_type: "grayscale"
train_images_path:"/workspace/tao-experiments/data/isbi/images/train"
train_masks_path:"/workspace/tao-experiments/data/isbi/masks/train"

val_images_path:"/workspace/tao-experiments/data/isbi/images/val"
val_masks_path:"/workspace/tao-experiments/data/isbi/masks/val"

test_images_path:"/workspace/tao-experiments/data/isbi/images/test"

data_class_config {
  target_classes {
    name: "background"
    mapping_class: "background"
    label_id: 0
  }
  target_classes {
    name: "foreground"
    mapping_class: "foreground"
    label_id: 255
  }
}
}

What is even weirder is that with a bigger dataset (same type of images but used 128 for foreground training masks, not 255), the predicted masks barely changed. They were all pretty similar to one another. Then again, I am not sure if to attribute this error to the foreground input masks intensity, or in this case to the dataset distribution.

I would appreciate if you could clear up some of these doubts of mine.

Thank you in advance.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Please refer to Data Annotation Format - NVIDIA Docs

Grayscale Input Image Type

For grayscale input images, the mask is a single channel image with size equal to the input image. Every pixel has a value of 255 or 0, which corresponds respectively to a label_id of 1 or 0 in the dataset_config and dataset_config_segformer.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.