How to increase dynamically allocatable memory in device function?

In device function, I want to allocate global GPU memory. But this is limited. I can set the limit by calling cudaDeviceSetLimit(cudaLimitMallocHeapSize, size_t* hsize) on host. However, it seems that I can only set this limit hsize up to 10241024(1024+1024-1)= 2146435072
, around 2GB. Any number bigger than this one assigned to hsize makes hsize equal to 18446744071563116544, a very big number. When I call cuda-memcheck, it shows an error:
Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch. Saved host backtrace up to driver entry point at error

I am using TITAN X GPU, which has 12GB memory, but it seems only 2 GB memory is available for dynamic allocation in device function. Is there any way to use all of my GPU memory in device function?

You’re being limited by int resolution in the constants you are combining. Use a properly qualified long long type variable and properly defined constants.

size_t rsize = 1024ULL*1024ULL*1024ULL*4ULL;  // allocate 4GB

cudaDeviceSetLimit(cudaLimitMallocHeapSize, rsize);

No, you can’t use all of your memory. You must leave enough for your program needs as well as CUDA overheads. But you should be able to use more than 2GB if your GPU has 12GB total.

I see. Thank you.