Correct way to free a device memory

I have a doubt about free a device memory pointed to a struct that have pointers

#define MAX 128
struct mydata{
	int value;
	int *vector;
};
int main(){
        struct mydata *reg;
	cudaMalloc((void**)&reg, MAX*sizeof(struct mydata));
	kernel_alloc_vector_and_fill<<<4, 32>>>(reg);
	cudaDeviceSynchronize();
	/* here I must free only reg or all pointers vector inside reg? */
	/* Testing free all pointers I get error in dereferencing memory (I was try access device memory inside host code) */
	cudaFree(reg); /* works fine */
}

In this case I should only free reg in host?
If this occurs on host memory the OS is responsible to free a memory in this case occurs same for GPU - for vector pointer?

You should free reg on the host because it was allocated as such.

As for your question about freeing pointers in the structs pointed at by reg, I think we’d need to see your kernel. You can indeed do device-side malloc and free calls but they’re oftentimes slow so they’re not recommended in general.

So if your kerel is calling malloc, your kernel should call free.

If it’s not used much whats happens with memory allocated by malloc to vector at end kernel? the kernel is responsible to call free pointers at end launch?

Yes, just as MutantJohn said.

allocations created by in-kernel malloc/new/cudaMalloc exist until:

  1. they are explicitly freed
  2. the context is destroyed
  3. the CPU process that owns the context terminates

kernel termination by itself does not free any allocations created by in-kernel malloc.

If you want to free them before process termination/context destruction, you must explicitly call free() on those pointers

So, when I came to CUDA, I was very used to writing CPU-based code so a structure with a pointer in it was very much a part of my toolbelt.

But in CUDA, it’s sort of better to just store an int instead of a pointer. Allocate your data separate in this case.

#include <thrust/device_vector.h>

using thrust::device_vector;

struct datum
{
  int value;
  int vector_idx;
};

int main(void)
{
  device_vector<int> int_vector;
  device_vector<datum> data;

  // launch kernel which then sets the vector_idx for each datum
}

This kind of has the advantage that it uses RAII and it also allows you to store invalid indices as a -1. This has useful properties.