I have a question regarding passing a single value to a kernel.
For example, assume I pass in an array and I want to add a value to every single index in that array but that value varies for each array (but is the same across the array) and I want to pass it in as an argument.
I’m pretty sure I know how to do the passing portion and setting it up as an argument, but my main question is how do you allocate the value.
float tmp;
cudaMallocHost(???, sizeof(float));
It is something like this but I’m not exactly sure. I’m assuming this is due to the fact that cudaMallocHost is looking for a void pointer. Therefore, if I want to pass a single value, should I just set it up as a pointer with the size of only 1 float?
Another question, if I do pass it by value and it is called by say 1000 threads, I won’t run into any bank conflicts would I? The threads would only be reading the value.
Another question would be, would it be better to put this into shared memory, or into constant memory. Obviously, this is assuming I have more than 1 block.
kernel parameters already end up in constant memory. This is covered in the programming guide.
to a first order approximation, this is a good place for it, and I wouldn’t worry about moving it to shared memory
Yeah I pretty sure that just assigns -1 to every value in the array.
And I know I said this earlier, but adding a number to every value in the array is not the actual task I’m looking to perform. I was just using that as an example as it describes the general objective that I’m trying to do, which was pass a value as an argument into a kernel by reference.
To be fair, what other data structures would you need in CUDA anyway? :P
A thrust::unique_device_ptr would be nice for situations like these. It’d also be a good time for Thrust to work on introducing move semantics into the library.