I want to get some buffers in constant memory because it is cached therefore faster than global memory. But I didn’t see any examples in SDK, which use dynamic alloc (something like cudaMalloc).
I don’t know if this is what you mean by dynamic, but you can do variable constant memory after the program has launched on the host. Here is a work around I’m using for a satellite image smoother using multiple pictures over time. External Image
After that, use your cudaMemcpyToSymbol (ask if you want more clarification, but it’s well documented and strait forward). Normal malloc commands will work but won’t do anything (you’ll have garbage in your constant memory). Using cudaMemcpyToSymbol, you’ll have an unrolled offset based array with your storage array containing the offsets to the information you need. This means that you can set up variable sizes before launching the kernel… but there’s a catch.
If you want to dynamically do that while on the device you are out of luck. This can only be done on the host so it’s not really dynamic in that sense, sorry. :(
Not sure if that’s what you meant by dynamic, but maybe something here will help. Best of luck! External Image