I’m trying to couple an existing CUDA library with an OpenACC code using async operations. I’m running into troubles after calling acc_get_cuda_stream(handle). Can you please give an example of how to use this function. The OpenACC docs say the API is
but I’m unclear how this should be used then. Does this return a pointer to a cudaStream_t or My current approach is this:
cudaStream_t *stream = (cudaStream_t *) acc_get_cuda_stream(acc_handle);
kernel<<<gridDim,blockDim,smem, *stream>>> (…);
but I get a segfault from libcuda.so during the kernel launch. I did confirm that I can use a cudaStream_t directly (that is, OpenACC is sync but the CUDA kernel is Async locally) and that works.
I also tried the following after noticing that cudaStream_t are pointers themselves under the hood.
cudaStream_t stream = (cudaStream_t)acc_get_cuda_stream(acc_handle);
which works!! So I assume this is the correct approach. Some documentation would be very helpful here. Perhaps my trial-and-error will be useful to other users.