[bug] Current device change after nppsMalloc(); does nppsMalloc() call cudaSetDevice()?

It looks like a terrible regression in the latest CUDA runtimes.
It seems that calling a nppsMalloc_() function will change the current cuda device !

The problem currently occurs with CUDA 12.8 on a Windows 10 machine.
Can someone confirm that it happens (or not) on Linux before I submit a bug report ?

  int d = 0;
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//0

  cudaSetDevice(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK (I have 2 GPUs)

  const NppLibraryVersion* version = nppGetLibVersion();
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK
  cudaSetDevice(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK

  nppsMalloc_8u(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//0 !

[edit]changed to CUDA performance forum category[/edit]
[edit]reverted to correct forum category[/edit]

I checked with cudaPointerGetAttributes() : actually, nppsMalloc() does not even allocate on device 1 ! The allocation occurs on device 0. I can’t see any reason for that behaviour.

[edit]
I also checked that cudaMalloc() was behaving correctly. It does.

Thanks for filing a bug ticket . This maps to NVBUG ID 5118223 which we are in looking . We will update conclusion both in ticket and here .

The ticket 5118223 conclusion is

We can reproduce this in 12.8 . But 12.6 is good .
We have root caused the problem and fixed the API to 12.6 behavior which no longer changes devices in multi GPU . This change will target a future major CUDA release after the all 12.x series , BUG ID will be included in release notes.

At the time , our suggestion is to use cuda APIs for memory management (cudaMalloc, cudaMallocPitch, cudaFree, etc) instead of equivalent NPP memory management functions on multi devices scenarios .

Thanks for reaching out to us .

Best,
Yuki

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.