I’m trying to couple an existing CUDA library with an OpenACC code using async operations. I’m running into troubles after calling acc_get_cuda_stream(handle). Can you please give an example of how to use this function. The OpenACC docs say the API is
void* acc_get_cuda_stream(int);
but I’m unclear how this should be used then. Does this return a pointer to a cudaStream_t or My current approach is this:
cudaStream_t *stream = (cudaStream_t *) acc_get_cuda_stream(acc_handle);
kernel<<<gridDim,blockDim,smem, *stream>>> (…);
but I get a segfault from libcuda.so during the kernel launch. I did confirm that I can use a cudaStream_t directly (that is, OpenACC is sync but the CUDA kernel is Async locally) and that works.
I also tried the following after noticing that cudaStream_t are pointers themselves under the hood.
cudaStream_t stream = (cudaStream_t)acc_get_cuda_stream(acc_handle);
kernel<<<gridDim,blockDim,smem,stream>>>(…);
which works!! So I assume this is the correct approach. Some documentation would be very helpful here. Perhaps my trial-and-error will be useful to other users.