Allocating single value for kernel argument

I have a question regarding passing a single value to a kernel.

For example, assume I pass in an array and I want to add a value to every single index in that array but that value varies for each array (but is the same across the array) and I want to pass it in as an argument.

I’m pretty sure I know how to do the passing portion and setting it up as an argument, but my main question is how do you allocate the value.

float tmp;
cudaMallocHost(???, sizeof(float));

It is something like this but I’m not exactly sure. I’m assuming this is due to the fact that cudaMallocHost is looking for a void pointer. Therefore, if I want to pass a single value, should I just set it up as a pointer with the size of only 1 float?

Yes, if you want to pass the argument by pointer.

__global__ void kernel(float *tmp,...);

float *tmp;
cudaMallocHost(&tmp, sizeof(float));
kernel<<<...>>>(tmp, ...);

Or you could pass it as a kernel argument by value.

__global__ void kernel(float tmp,...);

float tmp;
kernel<<<...>>>(tmp, ...);

Alright thanks txbob. That helps.

Another question, if I do pass it by value and it is called by say 1000 threads, I won’t run into any bank conflicts would I? The threads would only be reading the value.

Another question would be, would it be better to put this into shared memory, or into constant memory. Obviously, this is assuming I have more than 1 block.

kernel parameters already end up in constant memory. This is covered in the programming guide.
to a first order approximation, this is a good place for it, and I wouldn’t worry about moving it to shared memory

bank conflicts only apply to shared memory

I knew that about kernel parameters. Maybe I should have clarified whether to go to shared memory or stay in constant/global memory. My apologies.

Luckily, Thrust has your back too.

#include <thrust/device_vector.h>

namespace T = thrust;

int main(void)
  size_t const size = 1337;
  int    const val  = -1;

  T::device_vector<int> device_vals{size, val};

  int const* device_vals_data =;
  return 0;

That doesn’t add val to an existing array.

Yeah I pretty sure that just assigns -1 to every value in the array.

And I know I said this earlier, but adding a number to every value in the array is not the actual task I’m looking to perform. I was just using that as an example as it describes the general objective that I’m trying to do, which was pass a value as an argument into a kernel by reference.

I assumed the OP simply wanted an allocated region with an initial value :P

Otherwise there’s thrust::fill and thrust::transform.

Ah, I see.

Then yeah, it’s easiest to use pointers. I kind of like to exploit single-element vectors as a RAII way of doing so.

I’m curious if Thrust has device equivalent of std::unique_ptr.

Yeah no problem.

I would doubt it, I haven’t done much with thrust but I thought the only data structures you had where a host vector and a device vector.

To be fair, what other data structures would you need in CUDA anyway? :P

A thrust::unique_device_ptr would be nice for situations like these. It’d also be a good time for Thrust to work on introducing move semantics into the library.

I mean you don’t need any other tbh. An a unique_device_ptr is really just a vector with one index.