Hi all, I’m curious if any of you have experience with my problem. What I want to try and do is develop a way to have device threads running, waiting on data to be generated by the CPU and passed to them. So far my attempts have mostly looked like this:
[codebox]global void kernel(int *arr) {
while(arr[0] == 0)
arr[3] = 3;
}
int main() {
int *test;
unsigned long i;
int *arr, *d_arr;
// set up page locked host arrays. test is used to set a bit on the device, arr is the input array to the kernel
cudaMallocHost((void **)&test, sizeof(int));
cudaMallocHost((void **)&arr, sizeof(int) * 10);
for(i = 0; i < 10; i++) arr[i] = 0;
test[0] = 1;
// generate cuda streams to permit concurrent kernel execution and memory copies
cudaStream_t streams[2];
cudaStreamCreate(&streams[0]);
cudaStreamCreate(&streams[1]);
cudaMalloc((void **)&d_arr, sizeof(int) * 10);
cudaMemcpyAsync(d_arr, arr, sizeof(int) * 10, cudaMemcpyHostToDevice, streams[0]);
kernel<<<30,256, 0, streams[0]>>>(d_arr);
for(i = 0; i < 1000000000; i++) ; // wait for a time
// copy the value one into the first location of the input array, hopefully causing it to break out of the loop
cudaMemcpyAsync(d_arr, test, sizeof(int), cudaMemcpyHostToDevice, streams[1]);
// wait for device threads to return (but they never do...)
cudaThreadSynchronize();
return 0;
}
[/codebox]
But with no success. It seems that everything completes but that the change in the array value is not seen on the device. Does anyone have experience with this sort of problem?
Thanks very much for any help