Understanding CUDA- simple quesions

I have few questions:

  1. Are CUDA kernels similar to a do loop? For examples, I want to perform 3 simple operations in 3 loops on an array of integer elements (this is just an example):

a. Add 1 to each element of array of integers

b. Tag the elements in the array which are even

c. Add one to the tagged elements

for the above operations, do I need to invoke 3 kernels? Will the pseudo code look somethign like:


allocate memory on the device

call kernel (add one to all elements of the array)


call kernel (tag even elements)


call kernel (add 1 to the even elements of the array)


I/O to a file

free memory on device

  1. Can we allocate and deallocate on the “global” memory on the device only or do we have access to the shared, register and local memory also. How do programmers normally use memory (global only or they also use local, shared and registers )? Does global memory has any latency issues? What do programmer normally do to avoid those latency issues?

  2. For graphics, do we just copy the global memory data to the texture memory and then render it on screen or there are other efficient ways to deal with this?


  1. yes (you could also merge 3 kernels into 1 and call it once)

  2. No, you can’t allocate/deallocate memory on device. Read the cuda programming guide about memory.

  3. you can directly access data

All that is written in the programming guide.



Thanks sergeyn!
I think, i did not phrase my second question correctly. I meant to say that can we allocate/deallocate registers,shared and local memory on device “from the host”? I know that we can do that for global, i.e. allocate and deallocate memory on the device from the host.

Also, I read that texture memory has some advantages over using global memory. Are there any disadvantages too…i mean is it hard to program?

No, these resources are assigned statically at kernel launch.

Texture memory is cached read only resource, global memory is not cached read/write resource.

And programming in general becomes less difficult over time ;)

I am going through the manual. They are referring to device memory and global memory. Are they same? or device memory means global+shared+registers +local?
Thanks again for ur replies.

yes, the same