i have been following Accelerated Ray Tracing in One Weekend in CUDA blog post.
std::cout << fb[0] << std::endl; // first call
render<<<blocks, threads>>>(fb, nx, ny, samples_per_pixel, cam, origin, lower_left_corner, horizontal, vertical, d_world, d_rand_state);
cudaDeviceSynchronize();
std::cout << fb[0] << std::endl; // second call after render
context: fb is a pointer to a framebuffer of floats of size imagewidth*imageheight that’s taken as an command prompt argument.
In the second std::cout the program crashes.
render() has the following lines …vv
float u = float(i + curand_uniform(&local_rand_state)) / float(max_x);
float v = float(j + curand_uniform(&local_rand_state)) / float(max_y);
these 2 lines are the problematic lines
it seems that adding curand_uniform() is casuing some problems, if insted of curand_uniform() i add some other number, then the whole program works fine. it only crashes when i add curand_uniform(). (added constant value and itself for testing)
curand_uniform() is outputting a correct float, everything seems fine inside the function. But as soon as the function is exited, accessing fb elements crashes the program if curand_uniform() is added.
(location of fb is not changed (tested thru just printing fb))
Point to note: for small imagewidth (100 etc) it works fine, but if i set imagewidth to 280 or higher (imageheight is imagewidth/aspect_ratio (16.0/9.0 in this case)) then the crashing occurs.
fb is cudaMallocManaged()
i am using a rtx 2060 for gpu (if its helpful)
threads config:(16, 16)
blocks (based on threads):(imagewidth/threads.x + 1, imageheight/threads.y + 1)
pastebin to full code
i am sorry for pasting so much code(i dont know how to get seperate file linking to work)
the kernel function starts at line 423