Hi, I have a kernel that is erroring out. When I run it though compute-sanitizer
I get an Invalid __global__ write of size 4 bytes
, but when I look at the offending line I don’t see how it can be out of bounds. (Maybe I am wrong though)
Here is the code and the offending line (with some unrelated code stripped out for brevity):
__device__ inline int64_t divider() {
static const uint32_t look_up_table[] = { /* 33 const values*/ };
struct divider_node {
int64_t diff;
int32_t left;
int32_t sign;
}
divider_node* div_nodes = new divider_node[33];
for (size_t i = 0; i < 32; ++i) {
div_nodes[i].left = (int32_t)look_up_table[i]; // <--- offending line
// rest of loop
}
// rest of function
}
I’m not super well-versed in how Cuda handles local memory. I know that GPU memory is handled via register file, but not sure if that’s just for the stack or if that also for the heap when you allocate via new
.
I have tried reducing my block sizes. At first I was running 32x16x1 and then tried 8x8x1 but was still getting the same error, so I am not certain if it’s a memory overflow or not.