What happens to cudamalloc() + atomicAdd()?

Hi, everyone. I have some code similar to the following code where cuda malloced memory and atomicAdd() are used. In the second invocation of myfunction() , printf() shows that addr[threadIdx.x] is 2. It seems that the second invocation of mykernel() runs on the the same cuda malloced memory as the first invocation which is weird.

The GPU is GeForce GTX 1060, os is ubuntu 16.04, and cuda 10.1 is used.

Does anyone have some idea about this situation?

__global__ void mykernel(int *addr) {
  atomicAdd(addr, 1); 
  printf("add[%d]: %d\n", threadIdx.x, addr[threadIdx.x]);      

void myfunction() {
  int *addr;
  cudaMalloc(&addr, 1024);


int main() {

  return 0;

Why does that strike you as odd? After the first kernel, the allocated memory is freed with cudaFree, so the memory is available to the next cudaMalloc. You can add a printf ("addr=%p\n", addr); after the call to cudaMalloc to see where the requested memory block is located.

Side remark: In general it is not a good idea to operate on uninitialized data. Here, printing addr[threadIdx.x] might print just about any random integer.


As you suggested, I printed the pointer of cuda malloced memory, the two cudaMalloc() had the same address which resulted in my problem. How cudaMalloc() works? Why not each invocation of cudaMalloc() return a random memory area?

The internal operation of the CUDA memory allocators is not publicly documented and subject to change at any time. But even the simplest possible allocator maintaining a linked list of free blocks and a first fit strategy would return the same address for two identically-sized allocations with a deallocation in between.

Note that the address returned is a virtual address, so the second allocation may or may not use memory at the same physical address.

I think I had a wrong expectation. Anyway, thanks for your help.