For background information: I am training my training job on a single GPU.
When I changed the number of shards from 10 to 1, i noticed an increase in training time. Why is there a difference when I am only using 1 GPU?
Is it configured to use only a certain number of cores?
While using this multiple sharding technique, does the Parallelization effect the accuracy of the training negatively or positively?
Is it " the number of shards " of tlt-dataset-convert?
If yes, it only affect the tfrecords quantity.
Yes thats the one,
How does the quantity of tf_records effect the performance of training? Because, when I decrease the number of shards, it seems that the length of training goes up.
If you decrease the number of shards, each shard will have more images. But the total training images are not changed. Do you mean total training time will increase?
Yeah, if each shard has more images does that mean total training time will increase? Also does the number of images per shard effect the accuracy?
No, the total training images are not changed. So, it should not affect training time and the accuracy.