dynamic allocation of constant memory possible?

Hi,

I’m sorry if this question was posted before but I tried to search and did not find an explanating answer.

Like the topic says: Is it possible to dynamically allocate an free constant memory? My problem is that I need to do some FFT calculations and would like to do some windowing before. For this windowing I would like to have the parameters in constant memory for performance reasons. The program is working for some lower window lengths. But at a certain window length the FFT crashes the program with an CUFFT_EXEC_FAILED. The problem does not occur if I put my window parameters to normal global memory but for I am doing a lot of batched FFTs this decreases performance. After the windowing I could easily free the window buffer for it is not needed any more.

Now again:

  1. Is dynamic constant allocation possible?
  2. Can I influence the amount of constant memory cuFFT may use for the plan?
  3. Any other suggestions?

Thanks,
Vrah

As far as I know, It is not possible to allocate constant memory dynamically.

Few point about constant memory

  1. It is limited to 64KB. (G80, G92 architecture, may be more in G200)

  2. It is not possible to modify the content of constant memory from device. Which can be done only from host through cudaMemcpyToSymbol.

  3. By default constant variables have static storage.

  4. Address of constant variable can be obtained through cudaGetSymbolAddress

The following thread will be useful for you

http://forums.nvidia.com/index.php?showtop…269&hl=embedded

(Read the posts by Cyril Zeller & paulius from NVIDIA)

Thanks for the quick reply!

Well that was what I expected :sad: .

I read the thread you mentioned and it is said there that I might use texture memory as well. Atm I’m not using any texture memory and do not know much about it. So I guess I’ll have to read a bit. But if anyone could give me a short explanation about allocation and usage of texture memory it would be a great help!

Two more questions coming up:

  1. I’m using the “convolution Separable” example and modified it for my purposes. Is it possible to move the Filter Kernels into the texture memory without great perfomance loss? If so, how would I do this?

  2. If dynamic allocation of constant memory is not possible, how does the cuFFT use it? For my issue looks like they are actually using constant memory but I did not explicitly declare it.

Thanks!

Vrah

What is the memory access pattern when you read the parameters? Do all threads in a warp read the same constant memory location simultaneously? If so, constant memory will outperform textures by a factor of ~4 so you may notice a performance drop switching to textures.

Or does each thread read a different parameter indexed by threadIdx? In this case switching to a texture shouldn’t drop your performance significantly. If the access pattern is coalesced, there will be no difference between global memory and the texture here.

It adds a lot of code complexity, but you can declare your constant array to hold, say 60K of parameters. Then, based on your problem size dynamically choose how many parameters to set and read. Of course this will start to fail when you need more parameters than you have memory.

Texture code:

// global file scope

texture<float, 1, cudaReadModeElementType> tex;

__global__ void kernel()

    {

    param_index = // whatever

    float param = tex1Dfetch(tex, param_index);

    ....

    }

void call_kernel(float *d_params, int n_params)

    {

    // d_params is the global memory where the parameters reside. 

    cudaBindTexture(0, tex, d_params, sizeof(float) * n_params);

   kernel<<<...>>>(...);

    }

Thanks for the explanation. As mentioned I use the convolution Separable example. For being an example I think it should be optimized and it looks like there are simultaneous accesses to the filter parameters. A performance drop of about 4 won’t be great but I’ll give it a try.

I’m declaring the constant memory depending on some preprocessor defines and the size stays the same the whole duration of the program. But it seems that I’m allocating too much memory though in floats I would have 16k, which is a lot more than I’m using. But as mentioned at the beginning at a certain FFT length the FFT execution fails if I try to put the window parameters to constant memory.

I don’t really know how the FFT might be connected to constant memory (for the twiddles maybe) but thats the problem I am seeing.

Anyway thanks for the short example of texture usage, I will give it a try!

Any other suggestions are welcome.

Vrah

Edit: I just realized that this might be a bug in the cuFFT and it is not connected to the constant memory! It seems that really large batches create a problem in the FFT execution. At the moment I’m seeing this while trying to execute 512point FFTs in a batch of 16384. Maybe I should send a bug report to nVidia?

Edit²: Seems there is an error in one of my kernels causing the FFT to fail. So this issue should be my own fault.

Thank you all anyway for all the explanations regarding the constant memory!

Vrah