Low GPU performance

I am trying to use CUDA on windows laptop with an Nvidia RTX A500 GPU. I have cuda 12.3 together with pytorch-cuda 12.1 installed. However, the code runs considerably slower than on the CPU even though, on other machines, the very same code runs orders of magnitude faster.

In the nvidia-smi I can see that the python.exe is correctly listed as a GPU process, however the GPU Memory usage is N/A.

Turns out that the batch size optimal on one system was not optimal on the other, causing a memory bottleneck