Datawriter: Large file size

Hi everybody and thanks again for Isaac Sim!

I currently generate randomized data using the Python API of Isaac Sim and save it using the Datawriter class. However, the output file size for depth, instance and semantic data in .npy format is huge. Actually it is so big that approx 10.000 samples take up to 120GB on my system. I am planning to generate 1.000.000+ samples, so the dataset size would climb over 10TB.

I figured out the problem is that the data is stored using numpy.save, which preserves the internally used dtype. The instance / semantic segmentation data is stored in numpy arrays of type uint32 and depth data is stored as float32. So naturally the saved data is huge (in my case 1280x720x4 bytes).
For depth data I can just use the generated PNGs, which are small enough. However, I cannot recreate the original class data from the colorized instance / semantic PNGs.

Maybe you could reduce the output file size for future releases. E.g. for a low hanging fruit you could cast the instance / semantic data to uint8, which would reduce the file size by 4 times. You could also save the semantic / instance data directly to a PNG without colorizing, where the compression could make use of the sparse nature of the data.

The following image shows the file sizes of generated data using Datawriter. The .npy files are stored in original data type size and the PNGs are the colorized versions.

image

The next image shows the file sizes of generated data using uint8 data for the .npy files and _data PNGs which store the raw semantic data (no colorize).

Thank you @fabian.meyer for the feedback and very useful suggestions!

In the current release, we were saving the raw synthetic data output from the renderer. In the upcoming release, we will provide an option to reduce the output file size using approaches similar to some of the suggestions that you have provided.

Thanks, I’m looking forward to that release! For now I will stick with my custom datawriter

1 Like