Segfaulint with acc_get_cuda_stream

I’m trying to couple an existing CUDA library with an OpenACC code using async operations. I’m running into troubles after calling acc_get_cuda_stream(handle). Can you please give an example of how to use this function. The OpenACC docs say the API is

void* acc_get_cuda_stream(int);

but I’m unclear how this should be used then. Does this return a pointer to a cudaStream_t or My current approach is this:

cudaStream_t *stream = (cudaStream_t *) acc_get_cuda_stream(acc_handle);

kernel<<<gridDim,blockDim,smem, *stream>>> (…);

but I get a segfault from libcuda.so during the kernel launch. I did confirm that I can use a cudaStream_t directly (that is, OpenACC is sync but the CUDA kernel is Async locally) and that works.

I also tried the following after noticing that cudaStream_t are pointers themselves under the hood.

cudaStream_t stream = (cudaStream_t)acc_get_cuda_stream(acc_handle);

kernel<<<gridDim,blockDim,smem,stream>>>(…);

which works!! So I assume this is the correct approach. Some documentation would be very helpful here. Perhaps my trial-and-error will be useful to other users.