I am writing a Cuda application to perform an image processing application very fast. I tried compiling the cuda program on a fast 64-bit XP machine I have, but it seems that 64-bit cuda doesn’t quite work with C# Express 2008. I then tried compiling the code on another machine I have with 32-bit XP and everything works fine (the program reads in the image just fine, does the calculations, and outputs the data correctly). I then tried running the app that was compiled on the 32-bit machine on the 64-bit machine and the app seems to run fine, opens the image fine, but the output image is really screwed up. Has anyone else had this issue of cuda apps compiled in a 32-bit environment not working correctly in a 64-bit environment? In case anyone is wondering, I used the CUDA.NET library to develop and run cuda functions in C#. Thanks
So I figured out the issue and will post it here in case anyone else is having a similar problem. The fact that the “screwed up” output image was different each time I started the program (but always used same input image) lead me to thinking that this was a pointer issue. The CUDA.NET example code I used as a starting point used code very similar to the following code to send the input values to the graphics card:
[codebox]
CUdeviceptr d_idata = cuda.CopyHostToDevice(h_idata);
CUdeviceptr d_odata = cuda.Allocate(h_idata);
CUdeviceptr d_shift = cuda.CopyHostToDevice(labShift);
cuda.SetFunctionBlockShape(function, BLOCK_DIM, BLOCK_DIM, 1);
cuda.SetParameter(function, 0, (uint)d_odata.Pointer);
cuda.SetParameter(function, IntPtr.Size, (uint)d_idata.Pointer);
cuda.SetParameter(function, IntPtr.Size * 2, (uint)d_shift.Pointer);
cuda.SetParameter(function, IntPtr.Size * 3, (uint)size_x);
cuda.SetParameter(function, IntPtr.Size * 3 + 4, (uint)size_y);
cuda.SetParameterSize(function, (uint)(IntPtr.Size * 3 + 8));[/codebox]
The reason they claimed to use the IntPtr.Size as the parameter to specify the size of the input parameter was that those parameters were pointers and that the IntPtr.Size is dynamic with the operating system (meaning it return 4 in a 32-bit system and 8 in a 64-bit system). The .Pointer property on the CUdeviceptr class returns a uint though which is always 4 bytes so the data type lengths were correct in 32-bit, but wrong in 64-bit which messed up the input parameter pointers. I fixed it with the following changes:
[codebox]
cuda.SetFunctionBlockShape(function, BLOCK_DIM, BLOCK_DIM, 1);
cuda.SetParameter(function, 0, (uint)d_odata.Pointer);
cuda.SetParameter(function, 4, (uint)d_idata.Pointer);
cuda.SetParameter(function, 8, (uint)d_shift.Pointer);
cuda.SetParameter(function, 12, (uint)size_x);
cuda.SetParameter(function, 16, (uint)size_y);
cuda.SetParameterSize(function, (uint)(20));[/codebox]
So hard coding the starting positions fixed the problem and now my pointers are correct in 32-bit and 64-bit. Hope this helps someone else in the future.