read access violation - does it exist?

I am developing (and debugging…) a complex application and the subject of access violations is now on my plate.
Obviously writing out of bounds is forbidden, but what about reading?
because of unrolled loops etc. it is often convenient to read outside the input buffer (and then deal with correctness later on).

My question: is reading outside the buffer ever a reason for a cuda error?
In my app it seems to generally pass without trouble, but is it possible that there are crashes sometimes? Is this legal?

Note - I cannot solve this by just allocating the memory with padding as I am not always the one allocating it…

yes, reading out of bounds can generate access violations in CUDA as well, resulting in crashed kernels.

Pad and zero-fill if you’re in control of the allocation, otherwise perform bounds checks in code (these checks are cheap compared to the latency of memory accesses usually and may not even affect your run time at all).

The thing is that when my code DOES read out of bound, my tests can run for 12 hours with no crash. What makes it crash when it does?

When your out of bounds memory access crosses a page boundary into a page that is not mapped to physical memory. Whether or not this occurs depends on the size of the allocation, and how close the end of your data is from the last memory page of the allocation - and also on how far your kernel reads beyond your valid data.

When you bind your data to a texture (in the simplest case via linear memory), the texture address mode can be one of clamp, wrap, mirror, border. That gives you some reliable methods how out of bounds texture reads are treated. In particular you have the guarantee of not crashing. ;)

Using linear texture is generally an excellent idea for this case.
In my specific case I cannot use it because I am doing pointer tricks (like reading a byte buffer as ints, 4 bytes at a time).

There is another solution that works for specific cases that is worth mentioning. If every cuda block reads from a fixed size memory block, then only the last block has a problem. You can simply copy the last memory block to another buffer which is padded. All blocks work as before except the last one which uses the copied memory. Relatively easy to code and fast to execute.

This is what I find strange: After fixing all the WRITE access violations, there are still two READ violations that I know about, but the code doesn’t crash. 12 hours on a Volta. No crash. my Violations tend to be smaller then 64 Bytes. Can this be “legal” somehow? or am I just lucky? Can someone from Nvidia give definite answer?

Its for the reason that was already stated by cbuchner1.

The runtime (both hardware and software) do not have byte-accurate boundary detection for allocations. The specifics of this will vary by GPU type.

If you have an allocation of 32 bytes, accessing the 33rd byte is not guaranteed to flag any specific defect at runtime (other than undefined behavior). If you traverse far enough beyond the end of the allocation, on any architecture, you will eventually trip a hardware/runtime detected limit.

If you want byte-accurate checking, at significant performance cost, run your code with cuda-memcheck.

As far as I am aware, nearly all the statements made in this thread are roughly applicable to CPU activity as well:

  • a read out-of-bounds will be UB
  • this may or may not trigger a runtime fault, depending on a variety of factors
  • tools like valgrind exist to provide byte-accurate checking

To be clear, it is not legal to read out of bounds, it is absolutely exploring UB, and it may unpredictably trigger a runtime fault.

Thank you very much for your answers. I will act accordingly…