We’ve just obtained a S1070 and I’ve ported some code that was developed on a C870 onto it to get started but I can’t get passed initialization of the data.
I am getting an error message from the very first kernel
“setting the device when a process is active is not allowed”.
This is a new one to me, so
how can I tell if there is a process on a device?
when could I have started that process?
how can I stop it?
We’ve just obtained a S1070 and I’ve ported some code that was developed on a C870 onto it to get started but I can’t get passed initialization of the data.
I am getting an error message from the very first kernel
“setting the device when a process is active is not allowed”.
This is a new one to me, so
how can I tell if there is a process on a device?
when could I have started that process?
how can I stop it?
I’ve tracked the error to allocating memory on the device.
The following code
[codebox]
allocateArray((void**)&d_x, memSizefloat2);printf(“\nInitialize 1:%s\n”,cudaGetErrorString(cudaGetLastError()));
allocateArray((void**)&d_vx, memSizefloat2);printf(“\nInitialize 2: %s\n”,cudaGetErrorString(cudaGetLastError()));[/codebox]
gives the output
Initialize 1: no error
Initialize 2: setting the device while a process is active is not allowed
I’ve tracked the error to allocating memory on the device.
The following code
[codebox]
allocateArray((void**)&d_x, memSizefloat2);printf(“\nInitialize 1:%s\n”,cudaGetErrorString(cudaGetLastError()));
allocateArray((void**)&d_vx, memSizefloat2);printf(“\nInitialize 2: %s\n”,cudaGetErrorString(cudaGetLastError()));[/codebox]
gives the output
Initialize 1: no error
Initialize 2: setting the device while a process is active is not allowed
The problem was that in the allocateArray function I called cudaSetDevice before cudaMalloc and I’ve read in a post on a similar topic that calling cudaSetDevice when using later versions of CUDA can cause a failure.
So be careful. Porting from one device to another is not straightforward.
Check out “cudaSetDeviceFlags” API – There is a way to say that a devie can be used ONLY by ONE process…
And, there was an NVIDIA utility to set this correctly(smi utility or whatever…)…
May b, Tim might be able to give you more details on that (assuming thats the prob)
Check out “cudaSetDeviceFlags” API – There is a way to say that a devie can be used ONLY by ONE process…
And, there was an NVIDIA utility to set this correctly(smi utility or whatever…)…
May b, Tim might be able to give you more details on that (assuming thats the prob)
I’ve sorted the problem now and it was those calls to cudaSetDevice.
Code timings:
on C870 took 110s (which was blindingly quick anyway)
on C1060 took 42s
WOW!
I’m looking forward to using all devices in the S1070 in parallel.