[bug] Current device change after nppsMalloc(); does nppsMalloc() call cudaSetDevice()?

It looks like a terrible regression in the latest CUDA runtimes.
It seems that calling a nppsMalloc_() function will change the current cuda device !

The problem currently occurs with CUDA 12.8 on a Windows 10 machine.
Can someone confirm that it happens (or not) on Linux before I submit a bug report ?

  int d = 0;
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//0

  cudaSetDevice(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK (I have 2 GPUs)

  const NppLibraryVersion* version = nppGetLibVersion();
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK
  cudaSetDevice(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK

  nppsMalloc_8u(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//0 !

[edit]changed to CUDA performance forum category[/edit]
[edit]reverted to correct forum category[/edit]

I checked with cudaPointerGetAttributes() : actually, nppsMalloc() does not even allocate on device 1 ! The allocation occurs on device 0. I can’t see any reason for that behaviour.

[edit]
I also checked that cudaMalloc() was behaving correctly. It does.

Thanks for filing a bug ticket . This maps to NVBUG ID 5118223 which we are in looking . We will update conclusion both in ticket and here .