Understanding CUDA- simple quesions

pipajungle · June 12, 2009, 3:27pm

I have few questions:

Are CUDA kernels similar to a do loop? For examples, I want to perform 3 simple operations in 3 loops on an array of integer elements (this is just an example):

a. Add 1 to each element of array of integers

b. Tag the elements in the array which are even

c. Add one to the tagged elements

for the above operations, do I need to invoke 3 kernels? Will the pseudo code look somethign like:

Host

allocate memory on the device

call kernel (add one to all elements of the array)

host

call kernel (tag even elements)

host

call kernel (add 1 to the even elements of the array)

host

I/O to a file

free memory on device

end

Can we allocate and deallocate on the “global” memory on the device only or do we have access to the shared, register and local memory also. How do programmers normally use memory (global only or they also use local, shared and registers )? Does global memory has any latency issues? What do programmer normally do to avoid those latency issues?
For graphics, do we just copy the global memory data to the texture memory and then render it on screen or there are other efficient ways to deal with this?

Thanks!

sergeyn · June 12, 2009, 3:36pm

yes (you could also merge 3 kernels into 1 and call it once)
No, you can’t allocate/deallocate memory on device. Read the cuda programming guide about memory.
you can directly access data

All that is written in the programming guide.

Thanks!

[/quote]

pipajungle · June 12, 2009, 4:22pm

Thanks sergeyn!
I think, i did not phrase my second question correctly. I meant to say that can we allocate/deallocate registers,shared and local memory on device “from the host”? I know that we can do that for global, i.e. allocate and deallocate memory on the device from the host.

Also, I read that texture memory has some advantages over using global memory. Are there any disadvantages too…i mean is it hard to program?

sergeyn · June 12, 2009, 4:33pm

No, these resources are assigned statically at kernel launch.

sergeyn · June 12, 2009, 4:36pm

Texture memory is cached read only resource, global memory is not cached read/write resource.

And programming in general becomes less difficult over time ;)

pipajungle · June 12, 2009, 5:17pm

Thanks.
I am going through the manual. They are referring to device memory and global memory. Are they same? or device memory means global+shared+registers +local?
Thanks again for ur replies.

sergeyn · June 12, 2009, 5:23pm

yes, the same

pipajungle · June 12, 2009, 6:16pm

Thanks!