It looks like a terrible regression in the latest CUDA runtimes.
It seems that calling a nppsMalloc_() function will change the current cuda device !
The problem currently occurs with CUDA 12.8 on a Windows 10 machine.
Can someone confirm that it happens (or not) on Linux before I submit a bug report ?
int d = 0;
cudaGetDevice(&d);
printf("device : %d\r\n", d);//0
cudaSetDevice(1);
cudaGetDevice(&d);
printf("device : %d\r\n", d);//1 : OK (I have 2 GPUs)
const NppLibraryVersion* version = nppGetLibVersion();
cudaGetDevice(&d);
printf("device : %d\r\n", d);//1 : OK
cudaSetDevice(1);
cudaGetDevice(&d);
printf("device : %d\r\n", d);//1 : OK
nppsMalloc_8u(1);
cudaGetDevice(&d);
printf("device : %d\r\n", d);//0 !
[edit]changed to CUDA performance forum category[/edit]
[edit]reverted to correct forum category[/edit]
I checked with cudaPointerGetAttributes()
: actually, nppsMalloc()
does not even allocate on device 1 ! The allocation occurs on device 0. I can’t see any reason for that behaviour.
[edit]
I also checked that cudaMalloc() was behaving correctly. It does.