Out of curiosity, what is the performance when you use the PyTorch container?
My CUDA13 benchmark showed that CPU to GPU was still decent.
I only pasted the relevant stuff to the thread but the benchmark did test local CPU to GPU
Out of curiosity, what is the performance when you use the PyTorch container?
My CUDA13 benchmark showed that CPU to GPU was still decent.
I only pasted the relevant stuff to the thread but the benchmark did test local CPU to GPU