Datawriter: Large file size

fabian.meyer · July 30, 2021, 5:28am

Hi everybody and thanks again for Isaac Sim!

I currently generate randomized data using the Python API of Isaac Sim and save it using the Datawriter class. However, the output file size for depth, instance and semantic data in .npy format is huge. Actually it is so big that approx 10.000 samples take up to 120GB on my system. I am planning to generate 1.000.000+ samples, so the dataset size would climb over 10TB.

I figured out the problem is that the data is stored using numpy.save, which preserves the internally used dtype. The instance / semantic segmentation data is stored in numpy arrays of type uint32 and depth data is stored as float32. So naturally the saved data is huge (in my case 1280x720x4 bytes).
For depth data I can just use the generated PNGs, which are small enough. However, I cannot recreate the original class data from the colorized instance / semantic PNGs.

Maybe you could reduce the output file size for future releases. E.g. for a low hanging fruit you could cast the instance / semantic data to uint8, which would reduce the file size by 4 times. You could also save the semantic / instance data directly to a PNG without colorizing, where the compression could make use of the sparse nature of the data.

The following image shows the file sizes of generated data using Datawriter. The .npy files are stored in original data type size and the PNGs are the colorized versions.

The next image shows the file sizes of generated data using uint8 data for the .npy files and _data PNGs which store the raw semantic data (no colorize).

sdebnath · July 30, 2021, 9:05pm

Thank you @fabian.meyer for the feedback and very useful suggestions!

In the current release, we were saving the raw synthetic data output from the renderer. In the upcoming release, we will provide an option to reduce the output file size using approaches similar to some of the suggestions that you have provided.

fabian.meyer · August 5, 2021, 8:11am

Thanks, I’m looking forward to that release! For now I will stick with my custom datawriter

system · October 4, 2021, 8:12am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synthetic data generation via code: instanceSegmentation - ValueError Isaac Sim synthetic-data	12	1467	November 20, 2022
Custom writer produces black semantic images Synthetic Data Generation (SDG)	2	290	April 22, 2024
Save image in 16bit Isaac Sim camera	2	590	April 5, 2024
How to write a custom writer to save depth image? Isaac Sim	1	355	March 19, 2024
Isaac Sim OS (driver?) persistent memory leak Isaac Sim	12	1171	July 10, 2023
How to save InstanceSegmentation Image? Isaac Sim isaac-sim-v4-1-0	2	60	December 20, 2024
How to determine depth scale of a synthetically generated data from dope writer? Isaac Sim	4	766	February 1, 2024
YCB Writer not working with PathTraced Rendering Isaac Sim synthetic-data , isaacsim	10	876	September 23, 2022
Synthetic Data Recorder gives .npy file Isaac Sim synthetic-data , isaacsim	5	1037	August 1, 2022
Cannot add delay for BasicWriter in Replicator SDK script Synthetic Data Generation (SDG)	6	696	September 12, 2022

Datawriter: Large file size

Related topics