i have a few questions about memory allocations , still do not understand well how this should properly work. Or maybe my OS have been corrupted, and i have to reinstall, don’t know.
1.When i issue cuMemAlloc will the first call give me the first available address from the begining of the memory or can it give any memory address in the middle?
2. If my process crashes, or if i don’t call free context or free memory and exit, will be the allocated memory and context released automatically or these allocations are somehow stored on the gpu and only power shutdown will erase them?
3. Why do my cuMemAlloc calls of same size executing same code give different addresses at different days of the week. My code does only one cuMemAlloc call at the begining and then does all memory management by itself. When i was testing code on monday and tuesday the cuMemAlloc of size = 32MB was returning address around 3195152 (around 3MB). Then on wednessday it started giving me a new address but a bit lower , like 2093129 (slightly above 2MB). Yesterday it started to give me addresses begining with 61472768 , that’s 58MB from the begining of the addres space. I thought there was some memory used by previous calls, so i power shutdown , then power up, but i got no changes, the address resturned is still like 61472768
4. How much memory in the GPU is used for CUDA internals? For example, i have a device like this:
Device 0: "GeForce GTX 280" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 1073020928 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.35 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously)
and i issue a cuMemAlloc with size of 10241024977 i get “Out of memory error”
calling cuda Malloc with size=1024458752 cucall() error 2 in file <gpu_api.c>, line 69.
If i call it with 10241024976 , it works. But that’s 1024-976=48MB lost for something i don’t know what it is. Do CUDA internals use that much? Can this be configurable?
Thanks in advance for any help