I am using the “12_camera_v4l2_cuda” MMAPI example code to perform actions on my own unified memory buffers. This example normally just shows you a passthrough of the webcam video with a small black rectangle in.
I have a simple CUDA kernel that simply writes 0xFF to a number of consecutive bytes in a memory buffer that was allocated as unified memory.
If I try to write more than 128 bytes at a time, the screen goes black (no video output) and eventually I get these errors:
cuCtxSynchronize failed after memcpy
cuGraphicsEGLUnRegisterResource failed: 702
cuGraphicsEGLRegisterImage failed: 702, cuda process stop
Why is there a 128 byte limit? How do I write more than 128 bytes at a time?
My host code is like this:
cudaMalloc(&buf, 8000000);
And my device kernel code is like this (trying to write 256 bytes at a time):
for(x_offset = 0; x_offset < 256; x_offset++)
{
buf[x_offset] = 0xFF;
}
If I change it for 128 it works fine. If I use this instead:
buf[128 + x_offset] = 0xFF;
it works fine. I just cannot write more than 128 bytes in the loop at a time.