Error when convert kitti to tfrecord in official notebook TLT3.0

Hi,

I was trying to run the CV example tlt notebook: detectnet_v2.ipynb
It shows the following error when running to the command:

!tlt detectnet_v2 dataset_convert
-d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt
-o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

Converting Tfrecords for kitti trainval dataset
2021-02-22 16:27:43,228 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-02-22 23:27:50,365 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2021-02-22 23:27:50,365 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Creating output directory /workspace/tlt-experiments/data/tfrecords/kitti_trainval
Traceback (most recent call last):
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py", line 90, in <module>
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py", line 85, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/build_converter.py", line 76, in build_converter
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/kitti_converter_lib.py", line 76, in __init__
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py", line 57, in __init__
  File "/usr/lib/python3.6/os.py", line 210, in makedirs
    makedirs(head, mode, exist_ok)
  File "/usr/lib/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/tlt-experiments/data/tfrecords'
Traceback (most recent call last):
  File "/usr/local/bin/detectnet_v2", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/entrypoint/detectnet_v2.py", line 12, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.
2021-02-22 16:27:51,261 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I’m not sure what happened inside after the script open the docker container and why makedirs is giving this error. Is it because I’m running the docker as root? (I am pretty sure on the outside I add the user to docker group and that’s why I don’t need to use sudo to run the command)

And if I need to change it to user, how? It tells me to add the “user”:“UID:GID” to config file, but how exactly do I do this? Im not familiar with config docker with JSON, Can you provide a JSON example that include USER UID:GID?

Please give me some pointers. Thanks!

Hi,
Could you run below command and paste the result?
$ cat ~/.tlt_mounts.json

FileNotFoundError: [Errno 2] No such file or directory: ‘/workspace/tlt-experiments/data/tfrecords’

Please make sure above path is correct.

Hi, thanks for the response.
The result is here:

 {
        "Mounts": [
            {
                "source": "/home/yyc/Documents/XXX/local_project_dir", 
                "destination": "/workspace/tlt-experiments"
            }, 
            {
                "source": "/home/yyc/Documents/XXX/CV_sample_tlt/tlt_cv_samples_v1.0.1/detectnet_v2/specs", 
                "destination": "/workspace/tlt-experiments/detectnet_v2/specs"
            }
        ]
    }

The source folder paths are correct.

Can you insert a cell to run below command inside your notebook?
! cat $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt
! cat $DATA_DOWNLOAD_DIR
! cat $SPECS_DIR

em Im not so sure why are you insist on cat the variables. Shouldn’t it be ‘echo’? but here it is:

cat: /workspace/tlt-experiments/detectnet_v2/specs/detectnet_v2_tfrecords_kitti_trainval.txt: No such file or directory
cat: /workspace/tlt-experiments/data: No such file or directory
cat: /workspace/tlt-experiments/detectnet_v2/specs: No such file or directory

If I change cat to echo :
/workspace/tlt-experiments/detectnet_v2/specs/detectnet_v2_tfrecords_kitti_trainval.txt
/workspace/tlt-experiments/data
/workspace/tlt-experiments/detectnet_v2/specs

Sorry, the last two should be echo.

If I understand correctly, the command ‘tlt detectnet_v2 dataset_convert’ will run the docker container, mount the path and create the ‘workspace’ path inside the docker, so I don’t need to worry about them outside, am I right?

I cannot cat $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt cause $SPECS_DIR is not created outside the docker container.

I do find a root access file named ‘detectnet_v2’ is created under ‘local_project_dir’ after my unsuccessful execution on that command. Should it be non-root access?

All the env after tlt detectnet_v2 will be an env inside the docker.
You can insert a cell to check if the file exists.
!tlt detectnet_v2 run cat $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt

If you directly check $SPECS_DIR without tlt dectnet_v2, it just check if the directory exists in host pc.

Ok thanks. Here is the result of the command:

!tlt detectnet_v2 run cat $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt

2021-02-23 10:00:56,680 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
kitti_config {
  root_directory_path: "/workspace/tlt-experiments/data/training"
  image_dir_name: "image_2"
  label_dir_name: "label_2"
  image_extension: ".png"
  partition_mode: "random"
  num_partitions: 2
  val_split: 14
  num_shards: 10
}
image_directory_path: "/workspace/tlt-experiments/data/training"
2021-02-23 10:00:57,585 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I cannot reproduce your original error when run the notebook.
To narrow down, could you run a new cell with below?

! tlt detectnet_v2 run ls $DATA_DOWNLOAD_DIR

Is it possible is the permission problem? Could you show me how to add user :UID GID in json file as suggested by the docker handler?

! tlt detectnet_v2 run ls $DATA_DOWNLOAD_DIR

2021-02-23 10:25:15,212 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
/workspace/tlt-experiments/data
2021-02-23 10:25:16,140 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

This should be the reason. Please check your path and mapping again.

On my side,

! tlt detectnet_v2 run ls $DATA_DOWNLOAD_DIR
2021-02-24 01:23:33,584 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
tfrecords training
2021-02-24 01:23:35,480 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I see, so how do i fix this exactly?
should I run the container and create the path myself?

The mapping is right, I mean on the outside, the folders are there. inside I have not control is default in the notebook, the ones start with /workplace/

I think i already print all the path in the above replies, could you please point out exactly which path is mapped incorrectly?

All I changed in this notebook is the $LOCAL_PROJECT_DIR

Alright, I figured it out,

I created a ‘tfrecord’ folder in the outside under the data/ folder,
and add the mount path in the ~/.tlt_mounts.json

    {
        "source": "/home/yyc/Documents/XXX/local_project_dir/data/tfrecords", 
        "destination": "/workspace/tlt-experiments/data/tfrecords"
    }

It is the mapping problem as mentioned above.
I thought the mkdirs in the tool will help us create the /workplace…tfrecord folder inside the container.
But apparently, I have to manually create it outside and mount it.

Thanks for all your help~

@linyeglasses
The dataset_convert will automatically generate the data folder.
See the log "Creating output directory /workspace/tlt-experiments/data/tfrecords/kitti_trainval "

You need not to create data folder. And need not to mount it in ~/.tlt_mounts.json

Well, on my machine, I cannot run the !tlt detectnet_v2 dataset_convert without create and mount the ‘tfrecord’ folder in the outside.
I know it says creating the folder,
The error I reported above is that when it goes to this step:

2021-02-22 23:27:50,365 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Creating output directory /workspace/tlt-experiments/data/tfrecords/kitti_trainval

It shows :

FileNotFoundError: [Errno 2] No such file or directory: '/workspace/tlt-experiments/data/tfrecords'

I am assuming it fails to create the folder for some reasons. And I think that’s why I have to add the line to json file and it works. Or is there a better explanation?

Firstly, could you please check why get below wrong result when ls $DATA_DOWNLOAD_DIR ?
! tlt detectnet_v2 run ls $DATA_DOWNLOAD_DIR

2021-02-23 10:25:15,212 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
/workspace/tlt-experiments/data

Not sure how to check ‘why’ it gives this weird output, can you provide some steps to figure out why? everything seems normal. As I suggested I did not change anything else besides the “LOCAL_PROJECT_DIR”

I am still getting this output after I modify the json.

Could it be because I use a external drive, I mounted it to my pc and made a link from the data/ in external drive to the data/ under the local_ project_dir?

Could you please run
! tlt detectnet_v2 run ls $DATA_DOWNLOAD_DIR
and
! tlt detectnet_v2 run echo $DATA_DOWNLOAD_DIR

! tlt detectnet_v2 run ls $DATA_DOWNLOAD_DIR
! tlt detectnet_v2 run echo $DATA_DOWNLOAD_DIR

2021-02-23 17:16:07,276 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
/workspace/tlt-experiments/data
2021-02-23 17:16:08,179 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
2021-02-23 17:16:18,853 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
/workspace/tlt-experiments/data
2021-02-23 17:16:19,827 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.