CUDA.NET and AllocateHost

I’m trying to test the impact on performance of pinning host memory. I’m working in C# with CUDA.NET and am having difficulty understanding the model.

There are two calls, AllocateHost(uint bytes) and AllocateHost(T) that seem to be what I should use. Both return an IntPtr.

Using the first call seems straightforward, but C# makes it very difficult to use the returned pointer, and I think that I would end up having to copy my “normal” data into and out of the buffer to be able to do things reasonably on the host side – this doesn’t make practical sense to me.

My guess was that the pointer returned by the 2nd call is not really needed (except for deallocation – more about that later), and that the input array is pinned as a side effect. On reflection, this guess seems a little unlikely given that C# usually requires an “fixed” declaration on anything that can’t be moved around or garbage collected. But if not, then what is the relationship of the input array to the output pointer? Finally, given the code snippets below, I get an error on the FreeHost call. (The variable “cuda” is a GASS.CUDA.CUDA object.)

I would appreciate any insight, and thanks to the GASS people for a generally very nice and simple interface layer for .NET.


   byte[] fp_bit_counts;
    IntPtr fp_bit_counts_ptr;

fp_bit_counts_ptr = cuda.AllocateHost(fp_bit_counts);

        d_fp_bit_counts = cuda.Allocate<byte>(fp_bit_counts);