Can I prevent a destructor (or make it a no-op) from being called when passing a struct by value?

Hi, I’m having a problem of not wanting a destructor called when I pass a struct by value to a CUDA kernel. Ideally I’d declare it as a const reference, so that no copy is made, but CUDA currently insists that you pass by value or as a pointer (I don’t want to pass by pointer because I will need to allocate extra memory for the copy and could be slower from extra indirection)

Here are the approaches I’ve tried so far without success:

  1. mark the destructor as a host only function

  2. surround the destructor code with #ifdef CUDACC

    doesn’t work probably because the copy isn’t made in device code, but rather
    in a stub generated by NVCC

  3. Override the default copy constructor, so that it sets a bit marking the copy as
    a reference not to be freed

This successfully prevents the destructor from being called for CUDA, but causes leaks whenever I use STL code since functions like vector::push_back() call the copy constructor.

Can anyone help me do what I want?

This was a bit annoying to figure out! I tried several approaches but the one that seemed to work best is a raw reinterpret cast of the structure to get NVCC to simply flat-raw-byte-copy the struct. The ugliness involved raw type casting, unfortunately both at the kernel call and in one initial line inside the kernel.

#include <cstdio>
#include <cuda_runtime_api.h>

struct Arguments 
{
  Arguments(int data) : someData(data)
  { 
    printf("In Argument constructor\n");
  }
  ~Arguments() 
  {
    printf("In Argument destructor\n");
  }

  int someData;
};


struct PseudoArguments 
{
  char buffer;
};



__global__ void kernel(PseudoArguments pa)
{
  const Arguments &a=*((Arguments *)&pa);

  if (0==blockIdx.x && 0==threadIdx.x) 
    printf("In kernel, argument data = %d\n", a.someData);
}

int main()
{   
  Arguments a(12345);
  
  kernel<<<1, 32>>>(*(PseudoArguments *)&a);
  cudaThreadSynchronize();
}

Another alternative is to use the execution control API and explicitly setup arguments and the execution configuration before launching.

The sequence would be:

  1. cudaSetupArgument(&object,sizeof(object),0);
  2. cudaConfigureCall(grid,block,shared,stream);
  3. cudaLaunch(kernel);

I developed a small library to address problems like this which pop up in CUDA:

https://github.com/jaredhoberock/uninitialized

#include "uninitialized.hpp"
#include <cstdio>

class my_class
{
  my_class()
  {
    printf("my_class: in constructor\n");
  }

  ~my_class()
  {
    printf("my_class: in destructor\n");
  }
};

__global__ void kernel(uninitialized<my_class> uninit_x)
{
  my_class &x = uninit_x.get();

  // do stuff with x
  ...
}

int main()
{
  uninitialized<my_class> uninit_x;

  kernel<<<1,1>>>(uninit_x);
}

Thanks everyone for the ideas.

Jared, it seems your solution is basically what SPWorley proposed, but with the benefit of not having to declare an extra class. I might try it, though I still would like a more seemless way like overriding the copy constructor as I tried. The key question is how to make the copy constructor act differently between regular use, and when it’s passed to CUDA.

I’ve looked at the kernel launch stub code generated by NVCC, and this is what I see for:

class My
{
};
__global__ void Foo(My my)
{
}

int main()
{
  My my;
  Foo<<<1,1>>>(my);
}
void __device_stub__Foo(struct My&__par0)
{
  if (cudaSetupArgument(__cudaAddressOf(__par0), sizeof(__par0), (size_t)0UL) != cudaSuccess)
     return;

  {
     volatile static char *__f __attribute__((unused));
     __f = ((char *)((void ( *)(struct My))Foo));
     (void)cudaLaunch(((char *)((void ( *)(struct My))Foo)));
  };
}

void Foo( struct My __cuda_0)
{
  __device_stub__Foo( __cuda_0);
}

int main()
{
  My my;
  (cudaConfigureCall(1, 1)) ? ((void)0) : (Foo)(my);
  return 0;
}

The copy constructor is being called whenever the Foo stub is called. But I don’t get why NVCC generates 2 stubs (1 that passes by value and 1 that passes by reference).

I’m thinking of filing a bug report to request that the stubs pass structures by reference so that the copy constructor isn’t called. Do you think this is better, or is there some use for calling the copy constructor that shouldn’t be sacrified?