I have a 2 gpu system: a GTX480 for computation and a GT9800 for video.
I have code that runs perfectly on the default device (0) which in my system is the GTX480.
I’ve discovered by chance that if I explicitly set the device using cudaSetDevice(0), I get what appears to be a memory leak. I can run the code once, but if I try to run it again I get a cudaMalloc error. The problem comes and goes merely by commenting out/or not the cudaSetDevice(0) statement.
And yes, I’ve verified using cudaGetDevice(…) that I’m always using device 0.
I have a 2 gpu system: a GTX480 for computation and a GT9800 for video.
I have code that runs perfectly on the default device (0) which in my system is the GTX480.
I’ve discovered by chance that if I explicitly set the device using cudaSetDevice(0), I get what appears to be a memory leak. I can run the code once, but if I try to run it again I get a cudaMalloc error. The problem comes and goes merely by commenting out/or not the cudaSetDevice(0) statement.
And yes, I’ve verified using cudaGetDevice(…) that I’m always using device 0.
Did you call cudaThreadExit() before reinvoking all of your code? cudaSetDevice() can only be used if you haven’t already kicked off actual work on the GPU. Once you do, calling it is an error (which you can check from the error code returned from cudaMalloc).
As far as device ordering, there is no guarantee that Cuda’s ordering is the same as the PCI or system ordering.
Did you call cudaThreadExit() before reinvoking all of your code? cudaSetDevice() can only be used if you haven’t already kicked off actual work on the GPU. Once you do, calling it is an error (which you can check from the error code returned from cudaMalloc).
As far as device ordering, there is no guarantee that Cuda’s ordering is the same as the PCI or system ordering.
Thanks. Calling cudaThreadExit() did the job. I was lazy and didn’t put an error catch on the cudaSetDevice(0).
I’m curious though. When I said the problem appeared when I ran the code a second time, I meant that literally. I didn’t mean within a loop. I was under the impression that cuda threads were cleaned up when they were all finished and the code terminated.
Thanks. Calling cudaThreadExit() did the job. I was lazy and didn’t put an error catch on the cudaSetDevice(0).
I’m curious though. When I said the problem appeared when I ran the code a second time, I meant that literally. I didn’t mean within a loop. I was under the impression that cuda threads were cleaned up when they were all finished and the code terminated.