Large number of samples for dataset conversion for PointPillars

I’ve been training PointPillarNets using the custom dataset containing self-collected pointcloud and
annotation data and the exist KITTI dataset. Currently I have the training dataset containing over 30K samples.

I ran pointpillars.ipynb on firefox and all the samples mentioned above had to be processed by running
the following snippet:

!tao model pointpillars dataset_convert -e $SPECS_DIR/pointpillars.yaml

The problem I encountered is that I intended to add more samples and then run dataset_convert so as to finetune my model. I encountered the " Your Tab Just Crashed" several times.

I then reduced the number of samples back to 30K and things went well again.
I traced the system memory usage using the htop command and noticed that the tab crashed when the memory was completely used (64 GB memory for my training machine). The memory usage kept rising when dataset_convert was running until the tab crashed.

If this is exactly what made the tab crash, that means I can’t add more samples anymore due to the limited memory capacity.

Is there any workaround or way to avoid full memory usage?

You can try to increase the “SWAP” Memory in the Linux system. Refer to Issue while converting maskrcnn model to trt from etlt on Laptops - #23 by alaapdhall79

Hi, sorry for the late reply.
I just tried increasing the swap memory size, which was set to 2GB by default, to 100GB, and tried dataset_convert again.

At first it did process more samples than it had done before swap memory was increased, but it seemed that system still tend to use Mem first and when Mem was fully used the said firefox tab still crashed even if there were still lots of spaces of swap memory available. Only 11GB was used during the process.

I then once again used swapoff and swapon commands, but this time I set the priority to some positive numbers to see if swap memory would be used first. As a result, the swap memory wasn’t USED AT ALL…

UPDATED 2014 1/10 16:35
The system started using swap spaces after roughly 22 out of 62.5 GB memory spaces were occupied, but still very little.

Only 8.25MB of the swap spaces were occpupied when near 37 GB memory spaces were taken by dataset_convert.

That’s weird.

Instead of running dataset_convert via firefox. I directly ran it again without changing anything from the terminal and the memory wasn’t fully occupied as it used to. The memory consumption still increased over time but not as fast as it used to and I successfully converted 45K point cloud samples.

Can you share the command how you run in the terminal?

More, could you also try to run as below?

$ docker run --runtime=nvidia -it --rm --shm-size 32G nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 /bin/bash

The command I’ve been talking about in recent days on pointpillars.ipynb is:

!tao model pointpillars dataset_convert -e $SPECS_DIR/pointpillars.yaml,
where SPECS_DIR represents the path: /workspace/tao-experiments/pointpillars/specs

I literally ran the same command from the terminal in the tao conda environment:

With the same amount of data to be processed, same content in the .yaml file, the same command resultd in different memory comsumptions just because it was run differently, one from .ipynb on firefox and one directly from terminal.

As for the command you have mentioned, I tried it but saw this error message as a result:

docker: Error response from daemon: unknown or invalid runtime name: nvidia.

Please run below to install nvidia-docker.

sudo apt-get install nvidia-docker2
sudo systemctl restart docker.service

Still got the same error message after having run the 2 lines of commands.

Could you please open a new terminal and retry?

Please also try
sudo apt install -y nvidia-docker2
sudo systemctl daemon-reload
sudo systemctl restart docker

Refer to Triton Server can't run with GPU - #15 by Morganh

Could you open a new terminal and retry?
BTW, I correct the command as below.
$ docker run --runtime=nvidia -it --rm --shm-size 32G nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 /bin/bash

Still got the same error response.

Could you share the result of
$ cat /etc/docker/daemon.json

Spent some time locating the file but couldn’t find it.

“cat: /etc/docker/daemon.json: No such file or directory”

OK, please install Nvidia Docker runtime as well. Refer to Docker Error - Unknown or Invalid Runtime Name: Nvidia · Issue #132 · NVIDIA-ISAAC-ROS/isaac_ros_visual_slam · GitHub.

# Install Nvidia Docker runtime
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-container-runtime
sudo systemctl restart docker

Ran all the commands, started a new terminal and tried again. Still got the same error message as a result.

Is below available?
$ ls /usr/bin/nvidia-container-runtime

I think so.
image

Please do below as well.
$ sudo apt install nvidia-container-toolkit
$ sudo apt-get install nvidia-docker2
$ sudo pkill -SIGHUP dockerd