So I have a kernel which requires a lot of constants to run. What I’ve been doing is defining and initializing them on the host side (some of them need to be calculated), and then passing them into the kernel when I call it so that I don’t spend calculation time during the execution working them out. I.e. my kernel definition looks like
__global__ void grating(float* d_input_xr, float* d_input_xi, float* d_mode_xr, float* d_mode_xi, double* d_pitchfork_xr, double* d_pitchfork_xi, double* d_out, int width, int height, double sd, double k, double pitch, int width_half, int height_half, int mode_x_half, int mode_y_half, double focal_length, double tar_g, double time, double tar_s, double tar_l, double mode_offsetx, double mode_offsetz, double int_screen_mult, double screen_offsetz, double screen_offsetx, double u, double v, double zero_time)
These variables and elements of the arrays, are all called upon during the execution of the kernel.
What I’m wondering however, is where are these written into in memory? Does the compiler put them in local memory? After reading through the nvidia documentation it seems that local memory can be both off-chip as well as stored in cache or registers. I want my kernel to be able to access them as quickly as possible, so I was wondering if maybe instead of calling them in like this, I should perhaps define shared memory variables, and calculate them all in the 1st thread I launch per block, so they’re physically closer to the cores?
I tried using Nsight to actually figure out where the compiler was putting them, but I wasn’t able to understand it – best I could come up with was that some of them were listed as type “local double”, others as type “double” and yet others as type “parameter int.”
In any case, any advice on how to handle the glut of constants I need would be much appreciated.