[bug] Current device change after nppsMalloc(); does nppsMalloc() call cudaSetDevice()?

Chacha21 · February 18, 2025, 1:10pm

It looks like a terrible regression in the latest CUDA runtimes.
It seems that calling a nppsMalloc_() function will change the current cuda device !

The problem currently occurs with CUDA 12.8 on a Windows 10 machine.
Can someone confirm that it happens (or not) on Linux before I submit a bug report ?

  int d = 0;
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//0

  cudaSetDevice(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK (I have 2 GPUs)

  const NppLibraryVersion* version = nppGetLibVersion();
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK
  cudaSetDevice(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//1 : OK

  nppsMalloc_8u(1);
  cudaGetDevice(&d);
  printf("device : %d\r\n", d);//0 !

[edit]changed to CUDA performance forum category[/edit]
[edit]reverted to correct forum category[/edit]

I checked with cudaPointerGetAttributes() : actually, nppsMalloc() does not even allocate on device 1 ! The allocation occurs on device 0. I can’t see any reason for that behaviour.

[edit]
I also checked that cudaMalloc() was behaving correctly. It does.

Yuki_Ni · February 20, 2025, 2:16am

Thanks for filing a bug ticket . This maps to NVBUG ID 5118223 which we are in looking . We will update conclusion both in ticket and here .

Topic		Replies	Views
In CUDA rule, cudaSetDevice() is necessary for cudaFree() or not? CUDA Programming and Performance	0	1080	March 29, 2019
wrong results when using Cuda functions on multiple GPUs CUDA Programming and Performance	0	394	March 12, 2020
cudaMalloc() is returning cudaErrorNoDevice. why? but cudaGetDeviceCount() is returning 1 CUDA Programming and Performance	0	10930	August 12, 2009
Not working correctly new () and malloc () inside the kernel, why? CUDA Programming and Performance	2	1252	April 4, 2014
Dynamic memory allocation during kernel execution Is it posible? CUDA Programming and Performance	13	169374	January 25, 2013
cudaMemcpy returns cudaErrorNoDevice fortran wrapper and cudaMemcpy CUDA Programming and Performance	2	6771	October 25, 2009
CUDA Pro Tip: Always Set the Current Device to Avoid Multithreading Bugs Technical Blog	4	2143	June 7, 2021
Device number selection in CUDA code called from OpenACC code that uses "set device_num" nvc, nvc++ and nvfortran	1	543	February 1, 2022
cudaMalloc Allocating On All GPUs CUDA Programming and Performance	4	639	October 7, 2021
CUDA class - allocate memory using malloc (Dynamic Global Memory Allocation and Operations) CUDA Programming and Performance	3	3036	February 2, 2017

[bug] Current device change after nppsMalloc(); does nppsMalloc() call cudaSetDevice()?

Related topics