I’m declaring and initializing constant memory of a float array like this in C++:
__constant__ float fullConvKernel[ MAX_CONV_KERNEL_SIZE * 2 ];
ThrowIfCudaError( "SetupKernel",
cudaMemcpyToSymbol( fullConvKernel, fullConvKernelCopy, sizeof( fullConvKernel ) ) );
I’ve asserted that the fullConvKernelCopy is not zeroed out, but on the device, on only some runs, it is all zeros (0.0) and on the other runs the constant memory has the correct floating point values.
If I use shared memory instead of constant memory, the code works correctly every time.
I tried moving it into a struct, but that didn’t help:
struct FullConvKernel
{
float m_fullConvKernel[ MAX_CONV_KERNEL_SIZE * 2 ];
};
__constant__ FullConvKernel fullConvKernel;
I’m not sure what kind of race condition is going on here or what I’m doing wrong. I’m using CUDA version 9.2 on Centos 6 and the failure occurs on multiple video card types and seems to be some kind of race condition.