cudaSetDevice bug?

Win 64 Intel Core I7
Cuda 3.2
GTX480 and GT9800

I have a 2 gpu system: a GTX480 for computation and a GT9800 for video.
I have code that runs perfectly on the default device (0) which in my system is the GTX480.

I’ve discovered by chance that if I explicitly set the device using cudaSetDevice(0), I get what appears to be a memory leak. I can run the code once, but if I try to run it again I get a cudaMalloc error. The problem comes and goes merely by commenting out/or not the cudaSetDevice(0) statement.

And yes, I’ve verified using cudaGetDevice(…) that I’m always using device 0.

Win 64 Intel Core I7
Cuda 3.2
GTX480 and GT9800

I have a 2 gpu system: a GTX480 for computation and a GT9800 for video.
I have code that runs perfectly on the default device (0) which in my system is the GTX480.

I’ve discovered by chance that if I explicitly set the device using cudaSetDevice(0), I get what appears to be a memory leak. I can run the code once, but if I try to run it again I get a cudaMalloc error. The problem comes and goes merely by commenting out/or not the cudaSetDevice(0) statement.

And yes, I’ve verified using cudaGetDevice(…) that I’m always using device 0.

Likely unrelated but there is another quirk:

The nvidia GPU Computing SDK code sample called Device Query correctly shows my GTX 480 as being device 0, and the 9800 GT as being device 1.

However, the nvidia System Performance Monitor Tool calls my GTX480 - GPU2, and the 9800 GT - GPU1.

Likely unrelated but there is another quirk:

The nvidia GPU Computing SDK code sample called Device Query correctly shows my GTX 480 as being device 0, and the 9800 GT as being device 1.

However, the nvidia System Performance Monitor Tool calls my GTX480 - GPU2, and the 9800 GT - GPU1.

Did you call cudaThreadExit() before reinvoking all of your code? cudaSetDevice() can only be used if you haven’t already kicked off actual work on the GPU. Once you do, calling it is an error (which you can check from the error code returned from cudaMalloc).

As far as device ordering, there is no guarantee that Cuda’s ordering is the same as the PCI or system ordering.

Did you call cudaThreadExit() before reinvoking all of your code? cudaSetDevice() can only be used if you haven’t already kicked off actual work on the GPU. Once you do, calling it is an error (which you can check from the error code returned from cudaMalloc).

As far as device ordering, there is no guarantee that Cuda’s ordering is the same as the PCI or system ordering.

Just as srhines said. Check the cudaMalloc error.

You have the error handling reference here

So you could try something like this:

printf("Error: %s\n", cudaGetErrorString(cudaGetLastError()));

Here you have error types with definitions.

Just as srhines said. Check the cudaMalloc error.

You have the error handling reference here

So you could try something like this:

printf("Error: %s\n", cudaGetErrorString(cudaGetLastError()));

Here you have error types with definitions.

Thanks. Calling cudaThreadExit() did the job. I was lazy and didn’t put an error catch on the cudaSetDevice(0).
I’m curious though. When I said the problem appeared when I ran the code a second time, I meant that literally. I didn’t mean within a loop. I was under the impression that cuda threads were cleaned up when they were all finished and the code terminated.

Thanks. Calling cudaThreadExit() did the job. I was lazy and didn’t put an error catch on the cudaSetDevice(0).
I’m curious though. When I said the problem appeared when I ran the code a second time, I meant that literally. I didn’t mean within a loop. I was under the impression that cuda threads were cleaned up when they were all finished and the code terminated.

In fact, the documentation says

“cudaThreadExit() is implicitly called on host thread exit”

In fact, the documentation says

“cudaThreadExit() is implicitly called on host thread exit”

Yeah, but in the release notes for every single CUDA version, the following appears:

o It is a known issue that cudaThreadExit() may not be called implicitly on

host thread exit. Due to this, developers are recommended to explicitly

call cudaThreadExit() while the issue is being resolved

I don’t know when they are planning on actually resolving this…

Yeah, but in the release notes for every single CUDA version, the following appears:

o It is a known issue that cudaThreadExit() may not be called implicitly on

host thread exit. Due to this, developers are recommended to explicitly

call cudaThreadExit() while the issue is being resolved

I don’t know when they are planning on actually resolving this…