Large number of samples for dataset conversion for PointPillars

silentjcr · January 11, 2024, 3:24am

Done these and tried to run the docker again, but still got the same result.

Morganh · January 11, 2024, 3:28am

Can you run $ cat /etc/docker/daemon.json again?
If there is still empty, please vim it as below.

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

Then, update the docker.

sudo pkill -SIGHUP dockerd

silentjcr · January 11, 2024, 3:35am

It worked and I entered the docker as a root.
What should I do next?

Morganh · January 11, 2024, 3:40am

OK, please run below. Note that local-yourname should be modified.
$ sudo chown local-yourname:docker /var/run/docker.sock
$ sudo usermod -a -G docker local-yourname

Morganh · January 11, 2024, 3:41am

Do you mean you can run below command successfully, right?
$ docker run --runtime=nvidia -it --rm --shm-size 32G nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 /bin/bash

If yes, please run below command inside the docker. Note that there is not tao in the beginning.
# pointpillars dataset_convert xxx

silentjcr · January 11, 2024, 3:43am

Morganh · January 11, 2024, 3:44am

OK, please run below to check if there is still memory issue.
# pointpillars dataset_convert xxx

silentjcr · January 11, 2024, 3:44am

Does “yourname” refer to my account on the system?
It seems that “sudo” doesn’t exist in this environment.

Morganh · January 11, 2024, 3:46am

Yes.

Ignore doing this now. You already run inside the docker successfully now.

Could you share $nvidia-smi as well?

silentjcr · January 11, 2024, 3:52am

Couldn’t locate the yaml file when I’m in the docker.

Morganh · January 11, 2024, 5:03am

Please add -v in the command line to map your_local_path to docker path.
For example,

$ docker run --runtime=nvidia -it --rm --shm-size 32G -v /home/morganh/localfolder:/workspace/tao-experiments/
nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 /bin/bash`

silentjcr · January 11, 2024, 5:20am

As for the virtual memory consumption, it’s around 11.2G and there’s no sign of it increasing till full consumption over time.

So it’s strange.
Running dataset_convert directly from terminal or docker looks fine, but memory consumption may increase over time if I do the same thing via .ipynb from firefox.

Morganh · January 11, 2024, 5:23am

In the terminal, please let it continue to run. Please check if it can run successfully in the end.

silentjcr · January 11, 2024, 5:36am

Successfully, and the peak memory consumption is 11.7 GB.

Morganh · January 11, 2024, 5:55am

OK, so can you share ~/.tao_mounts.json？

Please refer to TAO Toolkit Launcher - NVIDIA Docs to set shm_size as well in it.
Then run again to check if it is successful in notebook.

silentjcr · January 11, 2024, 5:58am

Currently the content is as shown below:

Morganh · January 11, 2024, 6:02am

Refer to TAO Toolkit Launcher - NVIDIA Docs, you can add

    "DockerOptions": {
        "shm_size": "32G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
        },
        "ports": {
            "8888": 8888
        }
    }

silentjcr · January 11, 2024, 6:18am

Added this part to the mount file and the port I used to run .ipynb on firefox was exactly 8888. I modified the port in the DockerOptions to some other port like 8890 so that dataset_convert could start running.

However, the mem consumption still acts like how it used to. Obviously increasing.

Morganh · January 11, 2024, 6:22am

Is it successful or “crashed” ?

silentjcr · January 11, 2024, 6:26am

It’s still running, with memory consumption obviously increasing as well.
So far near 10K out of the 45K point cloud data have been processed in dataset_convert and the memory consumption has reached 26 GB so far. I’m expecting that the tab would just crash again when it reaches full memory consumption once again during the process.

Topic		Replies	Views
Detectnet_v2 notebook stuck at tfrecords conversion step TAO Toolkit	17	230	October 30, 2024
TAO - PIL.Image.DecompressionBombError TAO Toolkit	16	1248	December 22, 2023
Faster_RCNN sample dataset_convert command raise 'Docker instantiation failed with error: 500 Server Error: Internal Server Error' TAO Toolkit	22	1404	November 30, 2021
Convert custom dataset using nvidia tao TAO Toolkit tao	2	494	June 14, 2023
Train Pointpillar with Multi-GPU TAO Toolkit tao	11	2766	August 29, 2023
CUDA out of memory. Tried to allocate 314.00 MiB. GPU Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace TAO Toolkit	3	103	January 9, 2025
TAO Toolkit - FPENet - Dataset_Convert error TAO Toolkit	14	875	October 6, 2023
TAO data services Error response from daemon: No such container dataset convert error from kitti to COCO TAO Toolkit	14	567	June 11, 2024
Error when evaluate PointPillar network TAO Toolkit	6	883	June 4, 2023
6abdae4a2479:150:606 [3] NCCL INFO Call to connect returned Connection refused, retrying TAO Toolkit	29	2937	February 3, 2022

Large number of samples for dataset conversion for PointPillars

Related topics