I know that there are run-time limitations when running CUDA kernels on GPUs that are also being used by the X11 server. This may be obvious, but is there also memory limitations as well? A program utilizing CUDA kernels I’m running tends to fail, in most cases, when allocating memory, especially under CUFFT. For example, when I call cufftExecR2C, the program fails with the “unknown error” or “cufft error 2 ( CUFFT_ALLOC_FAILED )”, even though I’m pretty sure that I haven’t used up all the available video memory ( there are debugging statements in my code which output how much memory I’m allocated in the driver before I actually allocate them), at least in my code.
I’ve tried running this same program using a server with two Geforce GTXs and two Tesla’s, and it doesn’t seem to fail there.
This is my first post, so go easy on me if this question has already been asked or it seems really stupid.