Cuda Python to parallelize np.load


I am a new Cuda developer using Python. Is it possible for me to parallelize a for loop that loads .npy files using np.load()? I have been having issues implementing this, so I wanted to reach out to see if this can be done. I discovered a CUDA DALI, but that seems to only work on Linux machines, and I am working on a Windows10 machine. Any input would be greatly appreciated. Thank you!