That’s interesting: it works with smaller arrays.
If the arrays (Mu,…) have 10000 elements then it works but in my case the array should have 40000 elements and that doesn’t work.
I guess there must be a memory problem.
It’s strange because the cudamalloc and cudaMemcpy all give a Success status:
if ( cudaMalloc( (void**)&GPUMu, 64*sizeof(float)*NbrOfRobotPos ) != cudaSuccess )
mexErrMsgTxt("GPUMLE: GPUMu: Memory allocating failure on the GPU.\n");
if ( cudaMemcpy( GPUMu, Mu, 64*sizeof(float)*NbrOfRobotPos, cudaMemcpyHostToDevice) != cudaSuccess)
mexErrMsgTxt("GPUMLE: Mu: cudaMemcpy failure\n");
I have added printf(“ERROR string: %s\n”,cudaGetErrorString(cudaGetLastError())) in my code
and with a small array it says : no error code
but with a big array it says: invalid device pointer
That’s interesting: it works with smaller arrays.
If the arrays (Mu,…) have 10000 elements then it works but in my case the array should have 40000 elements and that doesn’t work.
I guess there must be a memory problem.
It’s strange because the cudamalloc and cudaMemcpy all give a Success status:
if ( cudaMalloc( (void**)&GPUMu, 64*sizeof(float)*NbrOfRobotPos ) != cudaSuccess )
mexErrMsgTxt("GPUMLE: GPUMu: Memory allocating failure on the GPU.\n");
if ( cudaMemcpy( GPUMu, Mu, 64*sizeof(float)*NbrOfRobotPos, cudaMemcpyHostToDevice) != cudaSuccess)
mexErrMsgTxt("GPUMLE: Mu: cudaMemcpy failure\n");
I have added printf(“ERROR string: %s\n”,cudaGetErrorString(cudaGetLastError())) in my code
and with a small array it says : no error code
but with a big array it says: invalid device pointer
I changed index2 and fixed it at zero. Then I recompiled my code.
The first two times that I ran the code it worked. The third time it crashed. Sometimes it crashes sometimes it does not.
After the Kernel call there is a pause during which the loop prints dots on the screen. This makes it easier to find the problem. I have noticed now that the crash does occur during the memcpy. Note that it does not crash always.
If it crashes I get diffferent errors
the launch timed out and was terminated
no error
invalid device pointer
My biggest problem is that I think that there is something wrong with the GPU hardware or driver because the results are not consistent.
I changed index2 and fixed it at zero. Then I recompiled my code.
The first two times that I ran the code it worked. The third time it crashed. Sometimes it crashes sometimes it does not.
After the Kernel call there is a pause during which the loop prints dots on the screen. This makes it easier to find the problem. I have noticed now that the crash does occur during the memcpy. Note that it does not crash always.
If it crashes I get diffferent errors
the launch timed out and was terminated
no error
invalid device pointer
My biggest problem is that I think that there is something wrong with the GPU hardware or driver because the results are not consistent.
My CUDA program works (and it’s at least 10x faster as the CPU version).
The problem was the watchdog timer. I have split up my kernel function and now I don’t have the problem.
I have read about disabling the watchdog but most people advice to keep it enabled.
My CUDA program works (and it’s at least 10x faster as the CPU version).
The problem was the watchdog timer. I have split up my kernel function and now I don’t have the problem.
I have read about disabling the watchdog but most people advice to keep it enabled.