I am stress testing an application using CUDA. It is a rendering server process that is single threaded. In this stress test, it has one client doing a 3D render ~ 30 ms kernel time per frame. and a second process doing a variety of renders, so, there is a fair amount of state thrashing. If there is no state thrashing, it runs for a long, long time. In the large majority of the cases, control does not come back from cudaMemcpy2D(). This claim is based on statements from a log file that is flushed after each write.
I’m running CUDA GeForce GT 740, Compute 3.0 - 4096 MB, Windows 7, Dell T5400. The device driver is 7.0 and the runtime is 5.50.
DLOG(“before cudaMemcpy2D id %08x host width %d align %d image w,h %d %d\n”, dst->id,
rp->host_image_buffer_width, rp->pix_sz,
rp->img_width, rp->img_buffer_height);
cudaError cudaErr = cudaMemcpy2D( dst->data,
rp->host_image_buffer_width * rp->pix_sz,
rp->img_buffer, rp->img_width * rp->pix_sz, rp->img_width * rp->pix_sz,
rp->img_buffer_height, cudaMemcpyDeviceToHost);
DLOG(“back from cudaMemcpy2D err %d\n”, cudaErr);
if (cudaErr != cudaSuccess)
{
…
}
On one occasion, this 1/2 mb copy took 15 seconds, which I have seen on copies into GPU memory upon occasion, too.
There is no measurable memory leak. I have atexit() set with logging and a deliberate crash, but, it only gets called when the process exits normally.
It is possible some other black magic is going on, but, I thought I would throw() it out here.
Thanks for any input.