I would like to allocate largest possible chunk of memory on the cuda global memory.
As far as I know, the framebuffer would need some of the global memory.
So I cannot allocate the entire global memory to my application.
So I did something like this -
Using the cudaDeviceProperties() i found the total global memory in the available card. = TotalGlobalMemory.
Next I predetermined that I would leave out 100 MB for the framebuffer. FramebufferMemorySize = 100 * 1024 * 1024
So, MemorySizeToAllocate = TotalGlobalMemory - FrameBufferMemorySize;
I then do a cudaMalloc(pCudaMemory, MemorySizeToAllocate)
This works fine on my development machine. (9800 GTX, with 2x24" monitors)
But when I try the same code on a different machine it either cannot do the cudaMalloc or I see some scribbling at a few pixels on the top of the screen. ( diff machine = 280 GT with 1 30" monitor + 1 24" monitor)
If I change the FramebufferMemorySize to 200 * 1024 * 1024, the application runs on the second machine without a hitch.
Does this have anything to do with the different monitor sizes? (ie FramebufferSize?) or is there something bigger at play here?
How do I determine what should be my FramebufferSize so that my code can run on any configuration?
Also, I am using WPF for the UI on my application. I heard that WPF uses the graphics card memory as well. If that is true, how do I make sure that WPF and my cuda computations do not overlap ?
I plan to run it on a Cuda enabled laptop which might have just 128 MB global memory. Taking 100 MB out of it leaves me with practically nothing.
Let me know if I am not clear or if you need more information from my side.
All help appreciated!
AFAIK there’s more memory you cannot use than the framebuffer. I don’t know what exactly eats up the VRAM but I’ve seen reports that the max. amount of memory that people could allocate on 512 MB cards was actually around 450-460. The only way to find that out is to benchmark it, trying to allocate more and more memory until there’s an error.
If your app will run for a long time it might be a good idea to include such a “hardware adjustment test” before starting work. Here’s an algorithm I came up with just now:
- Find out how much memory the card has (cudaDeviceProperties() )
- Allocate half of the available memory.
- If there was no error, try allocating the same amount again and again.
- If there was an error, reduce the size of the memory block by half, try to allocate it instead
Continue this until the amount of memory you’re trying to allocate goes below a certain threshold (no sense in trying to allocate 1MB blocks).
We start from big blocks to avoid segmentation. Remember to free all of it before you proceed (don’t loose your pointers!)
Thanks for your reply Bigmac. What you said makes sense.
In fact, I followed similar (if not the same) steps to conclude that leaving 100 MB off my 512 MB 9800 GTX card (ie allocating 512 - 100) is the sure way to keep my app running.
The problem is when I do not know what hardware would be in my customer’s machine.
Like I said in my earlier post, we changed the card to a 280GT with 1 GB memory, and (1GB - 100 MB) did not work for me. I had to do a (1GB - 200 MB).
Should I conclude that I must leave 100 MB for every 512 MB of global mem?
I do not mind running some code to check out what card it is and then deciding on that how much memory to leave, but there must be a deterministic way to decide that amount of memory.
Also, does the WPF factor seem suspicious to you?