Hi All,
I am trying to dynamically allocate a small array within a thread. However, the keyword “register” seems not work.
I have run a test using the following code. The running time (5263 ms) is the same as that without “register” (5264 ms), which probably means the array is still in global memory.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <time.h>
#define N_RUN 10000000
__global__ void Kernel_TestSpeed(int *dev_icount)
{
int icount = 0;
int value = 1;
int n = 1;
//register int *element = new int[n]; // dynamically allocate a small array in register ??
int *element = new int[n];
do
{
element[0] = value;
icount++;
}while(icount < N_RUN); // run 10 M steps
delete [] element;
dev_icount[0] = value;
}
int main()
{
int icount = 1;
int *dev_icount;
cudaMalloc((void**)&dev_icount, sizeof(int));
clock_t t1, t2;
t1 = clock();
Kernel_TestSpeed <<< 1, 32 >>> (dev_icount);
cudaMemcpy(&icount, dev_icount, sizeof(int), cudaMemcpyDeviceToHost);
t2 = clock();
printf("Running Time: %.3f ms\n", (double)(t2 - t1) / CLOCKS_PER_SEC * 1000);
printf("Result: %d\n", icount);
getchar();
cudaFree(dev_icount);
}
I am wondering whether there is a way to dynamically allocate a small array in register? I am using VS 2010 + CUDA 6.0 + K20C
Many thanks in advance!