After some quick searching, it seems to me that I should be able to use “operator new” in device code if I am compiling with at least compute 2.0. However, I have tried with both 2.0 and 3.0, but I am still getting an error about calling a host function (“operator new”) from a device function. Am I doing something wrong, or can I just not use “new” and “delete” in device code?
Should I just use malloc and free in my device code? Or can I just do something like this?
__device__ void* operator new(size_t bytes)
{
return malloc(bytes);
}
__device__ void operator delete(void* mem)
{
free(mem);
}
You can just use new and delete in device code. (Try it!)
You don’t need to provide those definitions for ordinary usage.
If you want to overload new for a specific purpose, you can do that also. An example is discussed here:
http://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/
(although that example is really host-side new)
Here’s a fully worked example of device new:
$ cat t716.cu
#include <stdio.h>
#define DSIZE 32
__global__ void kernel(){
int *a = new int[DSIZE];
for (int i = 0; i < DSIZE; i++) a[i] = i;
for (int i = 0; i < DSIZE; i++) printf("%d ", a[i]);
printf("\n");
}
int main(){
kernel<<<1,1>>>();
cudaDeviceSynchronize();
}
$ nvcc -o t716 t716.cu
$ cuda-memcheck ./t716
========= CUDA-MEMCHECK
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
========= ERROR SUMMARY: 0 errors
$
And as you point out, just like usage of device malloc, device new requires a cc2.0 device or higher.
I did try it, and I was getting those errors :/ I overloaded the new and delete operators for the classes I needed to allocate on the device, though, and the errors went away.