Hi Folks
Please refer the link below
I have 2 issues in data processing → tfrecord to numphy
- How to make this code multi-processing
- Compression_type is not working
Below some changes that were tested , is it the correct way
For multiprocessing , change line no 97
set tfrecord dataset
dataset = tf.data.TFRecordDataset(inputfiles, compression_type = args.compression, num_parallel_reads = args.num_processes)
dataset = dataset.apply(tf.data.experimental.ignore_errors())
For Compression , used below → no output files , without compression it works
root@e8ce27f609b1:/workspace/cosmoflow/tools# python3 convert_tfrecord_to_numpy.py -i /mnt/cosmoUniverse_2019_05_4parE_tf_small/train -o /mnt/processed/train -c 'GZIP' -p 1