I want to allocate a dynamic
half array for each thread in a kernel, following is some code:
__global__ void cd_stream(
const int num
// before allocate......
int count = num;
__half* acc = new __half[count];
so my question is where the data in
acc would be located, register? or local memory? or other place?
and if I want to allocate a dynamic array for each thread, and the data should store in register, how can i write the code? any response would be greatly appreciated!
Salut je suis nouveau je ne comprend pas
in kernel new allocates from the so-called “device heap”. This is in the logical global space, however it cannot interoperate with host APIs like
cudaMemcpy, but in other respects is similar to memory allocated e.g. via
For the remainder of your questions, there is no way to force (or really, even explicitly instruct or request) the compiler to “locate data in registers” or in any way explicitly instruct how the compiler will use registers. This topic is covered in many forum threads. here is an example. There are many others.
You can “encourage” the compiler to “place” a thread-local array in registers if the array is small enough and you have made sure that all indexing is discoverable at compile-time.
here is an example discussion
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.