Errors writing float4 values to device memory Can your hardware write float4 values?


I’ve come across a very simple CUDA kernel which gives results different from what I expect. This kernel is intended to copy memory using float4 loads and stores:

__global__ void testKernel1(float4* g_out, float4* g_in) {

    const int idx = blockDim.x*blockIdx.x + threadIdx.x;

    g_out[idx] = g_in[idx];


The kernel appears to run correctly on my Quadro NVS 290 card. However, on my Tesla C870 card it gives random errors for a fraction of the array elements. These errors occur most readily with big array sizes and under load (many kernel invocations in rapid succession.)

I have also tried kernels which read float4’s but write float’s (no problem) and kernels which read float’s but write float4’s (exhibits the same error.) So I am led to believe that the problem is specific to writing float4 values.

Besides this problem, my Tesla C870 card appears to be working fine.

I have attached an example CUDA file which exhibits the errors (at least on my hardware.) It can be compiled and run with the commands: [font=“Courier”]nvcc; ./a.out[/font].

If you run this test program then please let me know the results.



(p.s., I’m having some trouble attaching the CUDA file, so I gave it the .txt extension instead of .cu)
float4test.txt (3.04 KB)

Sounds like it might be a weird hardware problem with your C870. I bumped KERNEL_INVOCATIONS up to 100000 and ran the test on both GPUs of a D870 (each is a C870). There were no reported errors.

Thanks Mister Anderson, that’s very helpful information.