Hi everybody and thanks again for Isaac Sim!
I currently generate randomized data using the Python API of Isaac Sim and save it using the Datawriter class. However, the output file size for depth, instance and semantic data in
.npy format is huge. Actually it is so big that approx 10.000 samples take up to 120GB on my system. I am planning to generate 1.000.000+ samples, so the dataset size would climb over 10TB.
I figured out the problem is that the data is stored using
numpy.save, which preserves the internally used
dtype. The instance / semantic segmentation data is stored in numpy arrays of type
uint32 and depth data is stored as
float32. So naturally the saved data is huge (in my case 1280x720x4 bytes).
For depth data I can just use the generated PNGs, which are small enough. However, I cannot recreate the original class data from the colorized instance / semantic PNGs.
Maybe you could reduce the output file size for future releases. E.g. for a low hanging fruit you could cast the instance / semantic data to uint8, which would reduce the file size by 4 times. You could also save the semantic / instance data directly to a PNG without colorizing, where the compression could make use of the sparse nature of the data.
The following image shows the file sizes of generated data using Datawriter. The
.npy files are stored in original data type size and the PNGs are the colorized versions.
The next image shows the file sizes of generated data using uint8 data for the
.npy files and
_data PNGs which store the raw semantic data (no colorize).