I am unable to capture CUDA API calls when invoking CUDA code from golang. The timeline in Nsight Systems (image1) does not show a GPU row, but I am sure that CUDA code was invoked since my program returns the correct output (confirmed via the stdout file captured by Nsight Systems)
I am trying to run a simple golang+Cgo+CUDA example from here For reference here is the .cu file and .go file (image2)
I compile these as follows: (image3)
Then I run the compiled binary (image4) from Nsight Systems which gives my the above timeline:
The diagnostics summary says that “No CUDA events collected” (image5):
The stdout log file shows the expected output and the stderr log file doesn’t contain any errors (image6):
I am able to capture API traces in Nsight Systems when I try running cuda-samples (GitHub - NVIDIA/cuda-samples: Samples for CUDA Developers which demonstrates features in CUDA Toolkit)
Any idea why CUDA api traces are not being captured when invoked via golang?