Hi all,
From CUDA Graph in Cuda Fortran Mat has shown us how to create a CUDA Graph through Cuda Stream.
However, before call a kernel, I need to allocate device memory first. I am pretty sure that using CUDA Stream will not know that I need to allocate device memory first before calling a CUDA Kernel.
integer, managed, allocatable :: a(:)
cudaStreamBeginCapture(stream, 0)
allocate(a(10))
call kernel <<<block, thread, 0, stream>>> (a)
deallocate(a)
cudaStreamEndCapture(stream, graph)