General memory management help

Hi guys,

I’m still learning the basics of cuda through the manual, but some things are confusing.

Right now I’m writing a function that needs to create a texture by calling a bunch of global kernels. Each kernel needs to write a single row. What’s the best way of doing this?

I don’t want cpu interaction. Everything has to be done on the GPU.

I was thinking of making a cudaArray and giving a reference to it with the kernels. They could then each make a local linear memory and copy it as a row to the big cuda array.

Upon returning from the kernels, the cudaArray would have to be copied to a texture, so it can be used in upcoming kernels.

Can anyone post some example code on how I could achieve this?

I’m also confused as to what happens when I allocate a cudaArray on a host function. Is this actually allocated on the GPU or the CPU?

thx in advance