(CUDA C++)float buffer inaccesable weird (bug ?)

i have been following Accelerated Ray Tracing in One Weekend in CUDA blog post.


std::cout << fb[0] << std::endl; // first call



render<<<blocks, threads>>>(fb, nx, ny, samples_per_pixel, cam, origin, lower_left_corner, horizontal, vertical, d_world, d_rand_state);

cudaDeviceSynchronize();



std::cout << fb[0] << std::endl; // second call after render

context: fb is a pointer to a framebuffer of floats of size imagewidth*imageheight that’s taken as an command prompt argument.

In the second std::cout the program crashes.

render() has the following lines …vv


float u = float(i + curand_uniform(&local_rand_state)) / float(max_x);

float v = float(j + curand_uniform(&local_rand_state)) / float(max_y);

these 2 lines are the problematic lines

it seems that adding curand_uniform() is casuing some problems, if insted of curand_uniform() i add some other number, then the whole program works fine. it only crashes when i add curand_uniform(). (added constant value and itself for testing)

curand_uniform() is outputting a correct float, everything seems fine inside the function. But as soon as the function is exited, accessing fb elements crashes the program if curand_uniform() is added.

(location of fb is not changed (tested thru just printing fb))

Point to note: for small imagewidth (100 etc) it works fine, but if i set imagewidth to 280 or higher (imageheight is imagewidth/aspect_ratio (16.0/9.0 in this case)) then the crashing occurs.

fb is cudaMallocManaged()

i am using a rtx 2060 for gpu (if its helpful)

threads config:(16, 16)

blocks (based on threads):(imagewidth/threads.x + 1, imageheight/threads.y + 1)

pastebin to full code

i am sorry for pasting so much code(i dont know how to get seperate file linking to work)

the kernel function starts at line 423