Unable to write >128 consecutive bytes to Unified Memory buffer

I am using the “12_camera_v4l2_cuda” MMAPI example code to perform actions on my own unified memory buffers. This example normally just shows you a passthrough of the webcam video with a small black rectangle in.

I have a simple CUDA kernel that simply writes 0xFF to a number of consecutive bytes in a memory buffer that was allocated as unified memory.

If I try to write more than 128 bytes at a time, the screen goes black (no video output) and eventually I get these errors:

cuCtxSynchronize failed after memcpy
cuGraphicsEGLUnRegisterResource failed: 702
cuGraphicsEGLRegisterImage failed: 702, cuda process stop

Why is there a 128 byte limit? How do I write more than 128 bytes at a time?
My host code is like this:

cudaMalloc(&buf, 8000000);

And my device kernel code is like this (trying to write 256 bytes at a time):

    for(x_offset = 0; x_offset < 256; x_offset++)
		buf[x_offset] = 0xFF;

If I change it for 128 it works fine. If I use this instead:

buf[128 + x_offset] = 0xFF;

it works fine. I just cannot write more than 128 bytes in the loop at a time.

Please share patch and steps so that we can reproduce the issue by running 12_camera_v4l2_cuda. And the release version($ head -1 /etc/nv_tegra_release). Thanks.

In making a patch, I have re-examined my code in light of your comment and I am now able to write 256 bytes at a time so apologies for the post and thanks for reading it.
There is a possibility that I resolved my problem by using an int for my loop variable instead of an unsigned int, though I’m unsure.

1 Like