Cosmoflow NVIDIA HPC MLPERF implementation

Please refer the link below

I have 2 issues in data processing → tfrecord to numphy

  • How to make this code multi-processing
  • Compression_type is not working

Below some changes that were tested , is it the correct way

For multiprocessing , change line no 97

set tfrecord dataset

    dataset =, compression_type = args.compression, num_parallel_reads = args.num_processes)
    dataset = dataset.apply(

For Compression , used below → no output files , without compression it works

root@e8ce27f609b1:/workspace/cosmoflow/tools# python3 -i /mnt/cosmoUniverse_2019_05_4parE_tf_small/train -o /mnt/processed/train -c 'GZIP' -p 1