Training "never finishes" or system crashes using PyTorch - GPU has memory allocated but always has 0% utilization using DataLoader

My neural network training “never finishes” or system crashes (memory reaches limit or DataLoader worker being killed error occurs) using PyTorch (using CUDA, etc) - GPU has memory allocated but always has 0% utilization using DataLoader. I’ve tested several batch values; and in DataLoader, number of workers, shuffle true or false, pin_memory true or false. Considering some tests I’ve done, I can’t use number of workers greater than 1, even if I increase or decrease the batch value. I’m in the dark and will be very grateful with any help, thanks. I am using a NVIDIA V100 SXM2.

I was able to solve part of the problem by basically following the instructions on that page - PyTorch - NCC @ Durham< - and after the instructions on other page (about HDF5 file loading using multiprocessing, I can’t post the link because I can post just one link).
Now there is no error regarding the DataLoader worker and no memory overflow/system crash. Each DataLoader worker is using a thread to carry out the necessary loads, but the GPU is still at 0% utilization despite having a certain amount of memory allocated to it; and even varying the batch size, the training of just 1 epoch is not completed in the time I can do it end up running in another environment (actually I still haven’t been able to complete the training of just 1 epoch in any time I’ve tested it so far (each test I wait a maximum of 40 minutes, but in the other environment the training is completed much faster). Perhaps there is some bottleneck in relation to the GPU. Can you help me, please?