Printf gives memory access error and behaves weirdly


im starting to notice that printf in raygen function behaves weirdly, sometimes it prints sometimes it doesn’t, and sometimes it gives me memory access error. is this a known problem?

i’m just trying to print whether a ray hits or misses.

Hi @huyleq1989,

Do you have a lot of output? printf support relies on a CUDA buffering system, and it only has a small buffer, so easy to run out of buffer space which will truncate your output. See this thread Size of printf buffer

It’s ideal to limit your printf output to a small amount, perhaps ideally by passing in the thread index to your kernel that you’d like debug, and only invoking printf when your thread index matches the parameter. Personally I like to wire up my thread index to a mouse click and/or a command line parameter to my application, to make it easy to dynamically set the pixel I’m interested in.


1 Like

thanks for the quick reply.

i have a message with about 40 chars and 6 unsigned ints for each and every ray (e.g. ray index and other parameters im debugging). so it runs out of buffer memory i guess

Yes. The thread linked mentions the default buffer size is 1MB, so if you’re rendering a 1080p image, there isn’t enough space for even 1 character per ray. ;) The thread also mentions it’s the unprocessed string and some data, so there is overhead as well. And lastly, note that you have the option to increase the size of the buffer. The first comment includes the function call you can make to set the buffer size. If you can estimate a conservative buffer size, you can certainly make a larger buffer and probably get it to work, provided it’s still small enough to fit in memory on both CPU & GPU. That’s a quick and easy hack if limiting which threads will print won’t work well.


thank you for the suggestions. i’ll try changing the buffer size.

I would start with limiting the prints to fewer launch indices first.

You didn’t explain why you’re doing this, but if you need the output of all rays in any specific order, you’re not going to get that from CUDA where threads run in parallel.

This is really useful for debugging purposes of individual things where David’s method of simply selecting an erroneous pixel with the mouse and getting output for only that launch index shines.
Mind that the origins of mouse window coordinates and launch index coordinates are usually different (origin left-top vs. left-bottom), or when mapping launch indices differently for multi-GPU load distribution.

thanks for your pointers. i’m not rendering any images actually. i want to, at a number of locations, compute some integral that depends on the closest hit distance and the hit primitive. It’s kind of complicated so now what i ended up doing is to allocate additional arrays to store debugging information, to avoid using printf.

1 Like

That’s an excellent alternative, in my opinion, we should have mentioned it. ;) Allocating your own buffers for debugging and doing the printing on the CPU side gives you more control and explicit understanding of the sizes needed, and it let’s you control the order and minimize the storage space.


1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.